Hi ninken,
Thank you for your feature request and for sharing your insights regarding the validation and reconciliation functionalities.
Based on the discussions with our product team, implementing validation and reconciliation functionalities will require careful consideration, especially regarding the feasibility and efficiency of executing generic queries across various sources and destinations. As our product supports a wide range of sources and destinations, each with its own capabilities and limitations, we need to prioritize implementation based on factors such as source and destination compatibility and performance implications.
Specifically for the points mentioned:
1. Run Validation
This feature has been on our radar for a while and definitely something we do want to tackle. We are exploring the possibility of initially rolling out these features to certain sources and destinations where executing a generic 'count(*)' query is feasible and efficient. This may involve excluding certain destinations like File Storage as we’d have to download and count and there may be challenges with NoSQL databases and sources that do not support count queries. Then there are other sources where we have child tables that are just the expended aggregates of the parents. So we can get the count for the parent (Invoices) but not the child (InvoiceLineItems).
Given that, can you please let us know the specific sources and destinations you are using? This will help us tailor the feature implementation to best suit your requirements and address any unique challenges posed by the data source/destination.
2. Advanced Validation Mode
Our development team has reviewed the proposed advanced validation mode, and it closely aligns with our existing CheckCache functionality, which compares data between source and destination systems to identify discrepancies in Inserts, Deletes, and Updates.
However, executing such comparisons can be resource-intensive and may impact overall performance. The process involves querying the full result set in both systems, raising concerns about efficiency and resource allocation. The team's perspective here is to prioritize proactive replication to the destination system, optimizing efficiency and minimizing unnecessary overhead.
3.Validation Job Reporting:
That is a feature that our development team acknowledges its potential value but it’s implementation directly depends on the implementation of Run Validation feature.
Thank you once again for your contribution to our product enhancement efforts.
Hi Elsa,
Glad to hear this is on your radar. I understand the complexities trying to provide functionality across so many connectors including APIs. For my purposes I’m focused on Source and destinations between database systems, like Oracle, MSSQL, and IBM DB2 for this functionality where I think it would be easier for implementation and something you can advertise as available with the driver. Most of my sources are vendor creation of database and tables and lack of primary keys. This has left the CDC option off the table for most of our data feeds. I don’t want to bog down the request complexity for CData to start off with. A simple record count would be a wonderful start at least allows me to provide some information if I have any issues or not and put the business users mind at ease. In the future when development time allows, an Advance Validation mode would be great, and I understand it’s a resource intensive task. Hence something that would be good to be able to run on a schedule for reporting it or be triggered off as a manual review process to save time.
Thank you.
Hi ninken,
Thank you for your reply and the details given !
We'll actively consider your suggestions and are prioritize the implementation of validation features for database systems like Oracle, MSSQL, and IBM DB2. We'll keep you updated on our progress and look forward to providing you with a more enhanced product.
On a separate note, one of the advantages of CDC versus Standard Replication in sources that do support it in Sync (such as MSSQL and Oracle) , is that tables do no necessarily require to have primary keys.
I am including our latest blog referring to CDC in SQL in case you want to take a look:
https://www.cdata.com/blog/sql-server-cdc
I think the basic validation feature (item #1 on the list) would be great to do. A little problematic to do if the source is something like a flat file or an API. But if the source is a database, it should be really easy to run a “select count” against source and target after the data ingestion is complete. Could be as simple as a checkbox on the Job: “Do record count validation after loading".
Items 2 and 3 seem more complex. Item 3 in particular seems out of scope for what the intent of Sync is. Readers here may want to consider a more robust automated data-testing tool such as QuerySurge.