Solved

CData Sync - CSV Source file questions

  • 6 October 2023
  • 2 replies
  • 198 views

Userlevel 3
Badge +1

Hello, couple of CSV source related questions…

  1. How do I indicate, for a CSV-based source is, what the primary key column is for each table, so that an upsert can be properly done?
     
  2. Is it possible to move the files to another folder (in the operating system) after Sync has ingested them?  We would want to archive the files so they don’t get re-processed the next time.  I can do this outside of Sync if necessary, but it would be cleaner if Sync could do this somehow (or call out to a batch file on the OS and run that).
     
  3. In my case, I have many source files with a timestamp, like MYFILE_XYZ_20231006.TXT.  I would like to use the file Aggregation so that I can just pull them all into the same target table.  However when I do that, is there a way to pass along the individual filename being processed, into a column in the target table?

Thank you for any assistance you can provide!

 

icon

Best answer by Taylor 16 October 2023, 19:20

View original

2 replies

Hi @DougN Nouria ,

  1. You can set the Primary Keys of a source table by modifying the query for the Task. For instance, let's say I have a source file called DateTest.csv and I want to set "Col1” as the PK. If you edit the task and navigate to the "Query” tab, you can then edit the query syntax like so:

    REPLICATE [DateTest.csv] ([Col1] VARCHAR(255), PRIMARY KEY ([Col1])) SELECT * FROM [DateTest.csv]


    This syntax also supports composite keys:

    REPLICATE [DateTest.csv] ([Col1] VARCHAR(255), [Col2] INT, PRIMARY KEY ([Col1], [Col2])) SELECT * FROM [DateTest.csv]


    Another option is to set the UseRowNumbers connection property to true in your CSV Connection. This will automatically add "Row Number” as column and sets that as the Primary Key to the table.

  2. CData Sync supports both of those options through Events. Here you can define executions that run before and/or after a job completes.

    1. Here is a code example of running a batch file called myfile.bat:
       

      <!-- Execute Batch File -->
      <api:set attr="batch.name"  value="C:\\temp\\myfile.bat" />
      <api:set attr="batch.arguments" value="[_input.jobname]"/>
      <api:set attr="batch.directory"  value="C:\\temp\\" />
      <api:call op="sysExecute" in="batch" />

       

    2. Here is an example that moves a file from one directory to another

      <api:set attr="file.source" value="C:/temp/Source/file1.csv" />
      <api:set attr="file.destination" value="C:/temp/Dest/file2.csv" />
      <api:call op="fileMove" in="file" />

       

  3. This requires using the Load from Folder job type. It enhances the Aggregate Files property by adding additional metadata to the destination table (filename, filemodtime, rownumber) and ensures only new/updated files are processed while skipping files that have already been replicated.

Userlevel 3
Badge +1

Awesome, thank you so much, I will check this out!

Reply