Skip to main content

Hi there.

Is there a way of replicating data in batches.

We are looking at a lambda approach with small incremental replication, followed up with a weekly full reload to catch late-arriving data and physical/hard deletes.

The challenge is that the full reload data volumes are too large for the server hosting our CData Sync. Even with the JAVA heap adjustments.

I was wondering if we could do the reload/replication in a looping process.  

The source is Snowflake the target is Azure ADLS2 parquet files.

Hi David,

You can certainly configure CData Sync to work with large volumes of data by using replication intervals to ensure that the data is broken up into smaller batches. You can read more on how this is configured in the documentation here.

As for the weekly reload that you mentioned: you could configure a job to run separately from your other jobs and schedule the job to run on a weekly basis. You can find more information on how to schedule your jobs to run like this in the documentation here.

I hope that this answers your questions!


Reply