Skip to main content

Hello,

Has anyone experimented with different batch sizes while sending data into Snowflake?  It looks like the default is 100000 rows.  I am wondering if there are any downsides to trying a higher number.

Thanks!

Hey @DougN Nouria 

Generally speaking, it is actually a trade-off between the performance and the RAM allocated to the machine where Sync is running.

The higher batch size means you can write the same no. of records to Snowflake with fewer requests, which translates into better overall job performance. However, since you will be holding the batch of data in memory, this will also increase your Heap size utilized.

By the rule of thumb:

Higher Batch Size = Better Performance = More data to be held in memory = More RAM required.

You might also need to set a higher Timeout value, just in case you increase the Batch size.


Thank you!


hi @Ankit Singh  - is it possible for us to configure CData Sync to use more RAM of the VM? For example, when I do a initial replication of 1mil records from Redshift (RA3.16xlarge) to Snowflake (medium wh).. it takes roughly 3 mins.. I noticed Cdata is using only ¼ of the total 32GB RAM available. 


Hi @worldwidehow 

If the application is utilizing less RAM, that’s actually a good thing from Performance Standpoint. However, if you are looking to get even better performance on your job, you can always increase the Batch Size like mentioned above!


Reply