Enhancing Data Sync with Validation and Reconciliation Features

Related products: CData Sync

I am reaching out to propose an enhancement to our data sync product, specifically regarding the introduction of validation and reconciliation functionalities. As our data source requirements caters to various data synchronization needs, it's imperative to offer users tools necessary for ensuring data integrity and peace of mind during synchronization processes.

Run Validation:
- Introduce a validation feature that allows users to compare the data between the source and destination systems.
- Provide a simple comparison of the number of records in both the source and destination, enabling users to quickly identify any discrepancies.
- This basic validation functionality serves as a valuable troubleshooting tool and provides users with immediate insights into the synchronization status.
Advanced Validation Mode:
- Offer an advanced validation mode that provides detailed insights into differences between the source and destination data.
- Highlight discrepancies in terms of Inserts, Deletes, and Updates, enabling users to pinpoint specific data changes.
- Provide users with the option to initiate reconciliation controls for Inserts, Deletes, and Updates directly from the administration screen.
Validation Job Reporting:
- Enable users to schedule validation tasks as jobs, allowing for automated validation processes at specified intervals.
- Automatically generate and send out comprehensive validation reports to users, detailing the differences between the source and destination data.
- These reports serve as valuable documentation and facilitate proactive identification and resolution of data synchronization issues.

Thank you for considering this ideal.

Page 1 / 1

Hi ninken,

Thank you for your feature request and for sharing your insights regarding the validation and reconciliation functionalities.

Based on the discussions with our product team, implementing validation and reconciliation functionalities will require careful consideration, especially regarding the feasibility and efficiency of executing generic queries across various sources and destinations. As our product supports a wide range of sources and destinations, each with its own capabilities and limitations, we need to prioritize implementation based on factors such as source and destination compatibility and performance implications.

Specifically for the points mentioned:

1. Run Validation

This feature has been on our radar for a while and definitely something we do want to tackle. We are exploring the possibility of initially rolling out these features to certain sources and destinations where executing a generic 'count(*)' query is feasible and efficient. This may involve excluding certain destinations like File Storage as we’d have to download and count and there may be challenges with NoSQL databases and sources that do not support count queries. Then there are other sources where we have child tables that are just the expended aggregates of the parents. So we can get the count for the parent (Invoices) but not the child (InvoiceLineItems).

Given that, can you please let us know the specific sources and destinations you are using? This will help us tailor the feature implementation to best suit your requirements and address any unique challenges posed by the data source/destination.

2. Advanced Validation Mode

Our development team has reviewed the proposed advanced validation mode, and it closely aligns with our existing CheckCache functionality, which compares data between source and destination systems to identify discrepancies in Inserts, Deletes, and Updates.

However, executing such comparisons can be resource-intensive and may impact overall performance. The process involves querying the full result set in both systems, raising concerns about efficiency and resource allocation. The team's perspective here is to prioritize proactive replication to the destination system, optimizing efficiency and minimizing unnecessary overhead.

3.Validation Job Reporting:

That is a feature that our development team acknowledges its potential value but it’s implementation directly depends on the implementation of Run Validation feature.

Thank you once again for your contribution to our product enhancement efforts.

Hi Elsa,

Glad to hear this is on your radar. I understand the complexities trying to provide functionality across so many connectors including APIs. For my purposes I’m focused on Source and destinations between database systems, like Oracle, MSSQL, and IBM DB2 for this functionality where I think it would be easier for implementation and something you can advertise as available with the driver. Most of my sources are vendor creation of database and tables and lack of primary keys. This has left the CDC option off the table for most of our data feeds. I don’t want to bog down the request complexity for CData to start off with. A simple record count would be a wonderful start at least allows me to provide some information if I have any issues or not and put the business users mind at ease. In the future when development time allows, an Advance Validation mode would be great, and I understand it’s a resource intensive task. Hence something that would be good to be able to run on a schedule for reporting it or be triggered off as a manual review process to save time.

Thank you.

Hi ninken,

Thank you for your reply and the details given !

We'll actively consider your suggestions and are prioritize the implementation of validation features for database systems like Oracle, MSSQL, and IBM DB2. We'll keep you updated on our progress and look forward to providing you with a more enhanced product.

On a separate note, one of the advantages of CDC versus Standard Replication in sources that do support it in Sync (such as MSSQL and Oracle) , is that tables do no necessarily require to have primary keys.

I am including our latest blog referring to CDC in SQL in case you want to take a look:

https://www.cdata.com/blog/sql-server-cdc

I think the basic validation feature (item #1 on the list) would be great to do. A little problematic to do if the source is something like a flat file or an API. But if the source is a database, it should be really easy to run a “select count” against source and target after the data ingestion is complete. Could be as simple as a checkbox on the Job: “Do record count validation after loading".

Items 2 and 3 seem more complex. Item 3 in particular seems out of scope for what the intent of Sync is. Readers here may want to consider a more robust automated data-testing tool such as QuerySurge.

CData Software is a leading provider of data access and connectivity solutions. Our standards-based connectors streamline data access and insulate customers from the complexities of integrating with on-premise or cloud databases, SaaS, APIs, NoSQL, and Big Data.

Connect With Us

Get Started

Data Connectors

ETL/ ELT Solutions

Cloud & API Connectivity

OEM & Custom Drivers

Connect With Us

Get Started

Data Visualization

Company

News & Insights

Sign up

CData Community

Scanning file for viruses.

This file cannot be downloaded

Connect With Us

Connect With Us