Skip to main content
Solved

Batch size / Commit size for Snowflake target

  • October 6, 2023
  • 4 replies
  • 440 views

Forum|alt.badge.img+1

Hello,

Has anyone experimented with different batch sizes while sending data into Snowflake?  It looks like the default is 100000 rows.  I am wondering if there are any downsides to trying a higher number.

Thanks!

Best answer by Ankit Singh

Hey @DougN Nouria 

Generally speaking, it is actually a trade-off between the performance and the RAM allocated to the machine where Sync is running.

The higher batch size means you can write the same no. of records to Snowflake with fewer requests, which translates into better overall job performance. However, since you will be holding the batch of data in memory, this will also increase your Heap size utilized.

By the rule of thumb:

Higher Batch Size = Better Performance = More data to be held in memory = More RAM required.

You might also need to set a higher Timeout value, just in case you increase the Batch size.

This topic has been closed for replies.

4 replies

Ankit Singh
Forum|alt.badge.img+1
  • Employee
  • Answer
  • October 6, 2023

Hey @DougN Nouria 

Generally speaking, it is actually a trade-off between the performance and the RAM allocated to the machine where Sync is running.

The higher batch size means you can write the same no. of records to Snowflake with fewer requests, which translates into better overall job performance. However, since you will be holding the batch of data in memory, this will also increase your Heap size utilized.

By the rule of thumb:

Higher Batch Size = Better Performance = More data to be held in memory = More RAM required.

You might also need to set a higher Timeout value, just in case you increase the Batch size.


Forum|alt.badge.img+1
  • Author
  • Influencer
  • October 6, 2023

Thank you!


Forum|alt.badge.img
  • Apprentice
  • November 8, 2023

hi @Ankit Singh  - is it possible for us to configure CData Sync to use more RAM of the VM? For example, when I do a initial replication of 1mil records from Redshift (RA3.16xlarge) to Snowflake (medium wh).. it takes roughly 3 mins.. I noticed Cdata is using only ¼ of the total 32GB RAM available. 


Ankit Singh
Forum|alt.badge.img+1
  • Employee
  • November 8, 2023

Hi @worldwidehow 

If the application is utilizing less RAM, that’s actually a good thing from Performance Standpoint. However, if you are looking to get even better performance on your job, you can always increase the Batch Size like mentioned above!