Solved

Building a Standardized Querying Layer (Metrics, Dimensions, Filters & More)

  • 28 April 2023
  • 2 replies
  • 140 views

Userlevel 1
Badge

Hi CData community,

 

We are building a KPI tracking SAAS at Datapad and we’re using CData Drivers as the backbone.

Essentially, any user is able to login with a data source through OAuth and can create metrics & reports.

 

We wanted to as if there is a best-practice to create a querying layer that would work accross all integrations.

 

We want to:

  1. Show a list of available metrics (numeric / aggregation fields)
  2. Display a list of available dimensions (text / group by fields)
  3. Users can add filters to available dimensions (matching criteria)
  4. Users can change the aggregation on metrics (sum, avg etc)
  5. Support for special datetime dimension which fetch data grouped by (hour, day, time, ..)

 

We were thinking of building a dynamic SQL query builder to accomplish this.

 

Before we proceed we wanted to ask the community if there is any best-practice OR previous experience regarding our use-case. 

 

Thanks in advance !

icon

Best answer by John G 28 April 2023, 21:54

View original

2 replies

Userlevel 2
Badge

Hey @cruso!

That's a great question and I wanted to give a few tips on how to achieve standardized query execution across the CData drivers:

---
1) There are a set of system tables common to all of the CData connectors that you can use to pull metadata consistently across the drivers. Here's a link to Salesforce, but the same tables are available everywhere:

https://cdn.cdata.com/help/RFH/ado/pg_allsystables.htm

Now, your message seems to focus on a data-cube model, which is something we usually see in the ads-based connectors (Google Ads, Google Analytics, Facebook Ads, etc.). The sys_tablecolumns table in these ads-based connectors do surface an OLAPType field, which indicates whether the field is a dimension or metric. For the other drivers though, you may need to rely on the column type to make that determination, however.

2) Most of our drivers use the same query parser out of the box, but there is a subset which offload the query directly to the source by default. These are largely the "real database" sources, but there are a few others. These are indicated by the presence of the "QueryPassthrough" connection property.

In order to get a standardized query experience so you don't have to worry about different SQL dialects, it is recommended to set this property to False across the board. One way to do this is to query the sys_connection_props table, which is available in all drivers, to see if "QueryPassthrough" is available, and then set it to False on the connection.

3) Make sure to always quote the SQL identifiers (that's the columns, table, schema, catalog, etc.) when building the query and include the fully qualified name where applicable. The first part is to deal with identifiers that may contain special characters, like spaces, and the latter is to avoid ambiguity between multiple schemas or catalogs. For example, using our HubSpot driver:

Bad: SELECT Id, Company size, Name FROM Contacts - will cause an error due to "Company Size" not being quoted and, if we ignore that, the query is ambiguous between the "Hubspot" and "HubspotV3" schemas
Good: SELECT [Contacts].[Id], [Contacts].[Company size], [Name] FROM [CData].[HubspotV3].[Contacts]

CData's query parser supports square brackets ([ and ]) as well as double quotes (") for the quote characters.
---

Those points will hopefully help you get started. With that said, I'm also curious to see what other best practices and strategies our community can suggest.

Userlevel 1
Badge

Hi @John G, thank you for such a detailed & fast response!

This approach makes total sense. We will do a deep-dive with the team and let you know of our progress.

Reply