How CData Solutions Connect with NoSQL Data Sources

Forum|Forum|3 years ago
March 30, 2023
0 replies
172 views

Jerod
Community Manager

Our Drivers provide the fastest and most flexible support for NoSQL data integration. In this article we highlight the specific features and capabilities that are common across all of our NoSQL drivers, using our MongoDB Drivers as an example. Our Drivers allow you to interact with NoSQL databases as if they were a relational database, leveraging the flexibility of NoSQL data storage with the familiarity and utility of SQL-92 queries.

Due to the flexibility of NoSQL, it is common for data structures to be returned as JSON objects, arrays, or any combination thereof. While this is convenient for storing hierarchical data, it can be difficult to work with in common BI, reporting, and ETL tools. The CData Drivers include several facilities for mapping or flattening these data structures to simplify integration with standard tooling.

Key Features

Free-Form Queries: requesting exactly the data you want from your tables.
Horizontal Flattening: drilling down into embedded data (sub-documents and arrays).
Vertical Flattening: treating embedded arrays of sub-documents as separate tables.
Custom Schema Definitions: defining how the drivers view the MongoDB data.
Client-Side JSON Functions: manipulating the data returned to perform client-side aggregation and transformation.

Below are examples of these features. For reference, these examples are based on the following MongoDB sample document:

Sample Document

{
  "_id" : ObjectId("5780046cd5a397806c3dab38"),
  "address" : {
    "building" : "1007",
    "coord" : [-73.856077, 40.848447],
    "street" : "Morris Park Ave",
    "zipcode" : "10462"
  },
  "borough" : "Bronx",
  "cuisine" : "Bakery",
  "grades" : [{
      "date" : ISODate("2014-03-03T00:00:00Z"),
      "grade" : "A",
      "score" : 2
    }, {
      "date" : ISODate("2013-09-11T00:00:00Z"),
      "grade" : "A",
      "score" : 6
    }, {
      "date" : ISODate("2013-01-24T00:00:00Z"),
      "grade" : "A",
      "score" : 10
    }, {
      "date" : ISODate("2011-11-23T00:00:00Z"),
      "grade" : "A",
      "score" : 9
    }, {
      "date" : ISODate("2011-03-10T00:00:00Z"),
      "grade" : "B",
      "score" : 14
    }],
  "name" : "Morris Park Bake Shop",
  "restaurant_id" : "30075445"
}

Free-Form Queries

The simplest way to access your MongoDB data is by issuing a query based on what you understand to be in the data source. This gives you the freedom to select exactly the data that you want, regardless of the existence of a strict table schema. Consider the sample document above.

If you know that you want the _id, address.street, and grades[0] fields from each document, you can freely query that data from the database:

SELECT
  [_id],
  [address.street],
  [grades.0],
FROM restaurants;

The driver returns the value for those fields, given that the documents contain data. If the field does not exist in a given document, the driver simply returns a NULL for the field. This feature is useful whenever you have control over the SQL query being submitted to the CData driver, but obviously cannot be used when the opposite is true. There are other options available in that case.

Horizontal Flattening

The Flatten Arrays and Flatten Objects Connection Properties in the CData drivers allow you to control how objects and arrays in your MongoDB data are parsed to dynamically define the table schema for your MongoDB data. These properties allow you to configure how the data in a given document is horizontally flattened, creating a single table schema for all of the documents (including embedded data) in a given table. This is especially useful when you do not have granular control over the SQL queries being submitted.

In the examples below, we display the expected results, based on various values for Flatten Arrays and Flatten Objects, for the following query:

SELECT 
   *
FROM restaurants

FlattenArrays=0;FlattenObjects=False;

Without any horizontal flattening, the drivers discover seven columns for the table: _id, address, borough, cuisine, grades, name, and restaurant_id. Embedded data in the document is returned in a raw, aggregate form.

Result

_id	address	borough	cuisine	grades	name	restaurant_id
5780046cd5a397806c3dab38	{ "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }	Bronx	Bakery	[{"date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }, { "date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6 }, { "date" : ISODate("2013-01-24T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2011-11-23T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2011-03-10T00:00:00Z"), "grade" : "B", "score" : 14}]	Morris Park Bake Shop	30075445

FlattenArrays=0;FlattenObjects=True;

If you set Flatten Objects to "true", the number of columns expands as the embedded "address" sub-document is flattened. With Flatten Objects still set to "false" any arrays or arrays of documents will be returned as aggregates.

Result

_id	address.building	address.coord	address.street	address.zipcode	borough	cuisine	grades	name	restaurant_id
5780046cd5a397806c3dab38	1007	[-73.856077, 40.848447]	Morris Park Ave	10462	Bronx	Bakery	[{"date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }, ... ]	Morris Park Bake Shop	30075445

FlattenArrays=2;FlattenObjects=False;

The Flatten Arrays property determines how many items in an embedded array of sub-documents to treat as individual columns. By setting Flatten Arrays to "2" (while leaving Flatten Objects = "false"), we extract the first two items in the embedded arrays of a document.

Result

_id	address	borough	cuisine	grades.0	grades.1	name	restaurant_id
5780046cd5a397806c3dab38	{ "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }	Bronx	Bakery	{ "date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }	{ "date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6 }	Morris Park Bake Shop	30075445

FlattenArrays=1;FlattenObjects=True;

With Flatten Arrays set to "1" and Flatten Objects = "true", we extract the first item in the embedded arrays of a document and flatten any embedded sub-documents.

Result

_id	address.building	address.coord.0	address.street	address.zipcode	borough	cuisine	grades.0.date	grades.0.grade	grades.0.score	name	restaurant_id
57800...	1007	-73.856077	Morris Park Ave	10462	Bronx	Bakery	2014-03-03...	A	2	Morris Park Bake Shop	30075445

These columns are available for use in INSERT and UPDATE statements as well, allowing you to add or update individual fields within sub-documents and arrays.

Vertical Flattening

Documents in MongoDB frequently contain an array (or arrays) of sub-documents. While it is possible to drill down into these sub-documents using horizontal flattening (see above section), a common way of dealing with such arrays in NoSQL databases is to treat them as separate tables of data. This process is known as vertical flattening and doing so helps to build a relational model between the different 'types' of documents in a MongoDB instance.

Considering the sample document above, you could retrieve the grades array as a separate table:

SELECT 
  *
FROM [restaurants.grades];

This query returns the following data set:

date	grade	score
2014-03-03T00:00:00Z	A	2
2013-09-11T00:00:00Z	A	6
2013-01-24T00:00:00Z	A	10
2011-11-23T00:00:00Z	A	9
2011-03-10T00:00:00Z	B	14

You may also want to include information from the base restaurants table. You can do this with a join. Flattened arrays can only be joined with the root document. The driver expects the left part of the join is the array document you want to flatten vertically. Set the SupportEnhancedSQL connection property to false to join nested MongoDB documents.

SELECT 
  [restaurants].[_id], [restaurants.grades].* 
FROM 
  [restaurants.grades] 
JOIN 
  [restaurants] 
WHERE 
  [restaurants].name = 'Morris Park Bake Shop'

This query returns the following data set:

_id	date	grade	score
5780046cd5a397806c3dab38	2014-03-03T00:00:00Z	A	2
5780046cd5a397806c3dab38	2013-09-11T00:00:00Z	A	6
5780046cd5a397806c3dab38	2013-01-24T00:00:00Z	A	10
5780046cd5a397806c3dab38	2011-11-23T00:00:00Z	A	9
5780046cd5a397806c3dab38	2011-03-10T00:00:00Z	B	14

Custom Schema Definitions

In order to treat your MongoDB data as a relational database, a table schema must exist. The schema can be created dynamically by using Connection properties or by defining the schema yourself. This is another option for drilling down into your data when you do not have full control of the SQL queries being constructed.

Given the document above, you could expose the _id (as the primary key), name, address.zipcode, and the first entry in the grades fields by creating the following schema:

<rsb:script xmlns:rsb="http://www.rssbus.com/ns/rsbscript/2">

  <rsb:info title="StaticRestaurants" description="Custom Schema for the MongoDB restaurants data set.">  
    <!-- Column definitions -->
    <attr name="id"           xs:type="int32" iskey="true" other:bsonpath="$._id"/>
    <attr name="name"         xs:type="string" other:bsonpath="$.name"           />
    <attr name="zipcode"      xs:type="string" other:bsonpath="$.address.state"  />
    <attr name="latest_grade" xs:type="string" other:bsonpath="$.offices.grade"  />
  </rsb:info>  

  <rsb:set attr="collection" value="companies"/>

</rsb:script>

Once you have created your custom schema files, save them to disk using ".rsd" as the file extension (typically in the db folder at the installation location) and set the Location Connection Property to the same location. The driver will expose the defined tables in any third party tools and apps based on the title attribute of rsb:info. You can also query the data explicitly by using the title as the table name in a SQL query:

SELECT
  id, latest_grade
FROM
  StaticRestaurants;

By defining the schema for your MongoDB data, you gain granular control over you data in a way that is not commonly supported in BI, reporting, and ETL tools, allowing you to leverage the data visualization, transformation, and extraction features of your favorite tools to work with your data in the way that you want. Custom schemas also allow you to define different views of the data stored in a single "table", meaning that you can take full advantage of the NoSQL nature of a MongoDB database where a given table can contain documents whose relevant fields are differentiated by something like a type field.

Client-Side JSON Functions

The documents in MongoDB data sets are essentially JSON structures. The CData Drivers support using standard SQL functions to work with JSON structures. These functions work by pulling in the MongoDB document and parsing the relevant information in the client. While there are many functions supported, we only highlight a few here. The examples below use the sample document, contained in the table 'Students':

{
  id: 123456,
  ...,
  grades: [
            { "grade": "A", "score": 96 },
            { "grade": "A", "score": 94 },
            { "grade": "A", "score": 92 },
            { "grade": "A", "score": 97 },
            { "grade": "B", "score": 84 }
          ],
  ...
}

JSON_EXTRACT

The JSON_EXTRACT function can extract individual values from a JSON object. The following query returns the values shown below based on the JSON path passed as the second argument to the function:

SELECT 
  JSON_EXTRACT(grades,'[0].grade') AS Grade,
  JSON_EXTRACT(grades,'[0].score') AS Score
FROM Students;

This query returns the following data:

Grade	Score
A	96

JSON_SUM

The JSON_SUM function returns the sum of the numeric values of a JSON array within a JSON object. The following query returns the total of the values specified by the JSON path passed as the second argument to the function:

SELECT 
  Name, 
  JSON_SUM(score,'[x].score') AS TotalScore 
FROM Students;

This query returns the following data:

TotalScore
463

DOCUMENT

The DOCUMENT function can be used to retrieve the entire document as a JSON string. See the following query and its result as an example:

SELECT 
  DOCUMENT(*) 
FROM Students;

The query above returns each document in the table as a single string.

DOCUMENT
{ "_id" : ObjectId("5780046cd5a397806c3dab38"), "address" : { "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }, "borough" : "Bronx", "cuisine" : "Bakery", "grades" : [{ "date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }, { "date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6 }, { "date" : ISODate("2013-01-24T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2011-11-23T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2011-03-10T00:00:00Z"), "grade" : "B", "score" : 14 }], "name" : "Morris Park Bake Shop", "restaurant_id" : "30075445" }

DOCUMENT

{ "_id" : ObjectId("5780046cd5a397806c3dab38"), "address" : { "building" : "1007", "coord" : [-73.856077, 40.848447], "street" : "Morris Park Ave", "zipcode" : "10462" }, "borough" : "Bronx", "cuisine" : "Bakery", "grades" : [{ "date" : ISODate("2014-03-03T00:00:00Z"), "grade" : "A", "score" : 2 }, { "date" : ISODate("2013-09-11T00:00:00Z"), "grade" : "A", "score" : 6 }, { "date" : ISODate("2013-01-24T00:00:00Z"), "grade" : "A", "score" : 10 }, { "date" : ISODate("2011-11-23T00:00:00Z"), "grade" : "A", "score" : 9 }, { "date" : ISODate("2011-03-10T00:00:00Z"), "grade" : "B", "score" : 14 }], "name" : "Morris Park Bake Shop", "restaurant_id" : "30075445" }

Get Started

The NoSQL Drivers offer tremendous flexibility in working with NoSQL databases. Check out the latest NoSQL Drivers for more information or to download free 30-day trials!

NoSQL Drivers: JOIN vs Relational View Mode - Learn how the View Mode property controls how NoSQL data with parent-child or other relationships is presented.
NoSQL Drivers: Performance Comparison - Compare how drivers from different vendors perform when querying and processing large datasets from NoSQL data sources.
NoSQL Drivers: Feature Comparison - Compare how drivers from different vendors handle complex queries and complex datasets from NoSQL data sources.

CData is the data layer that makes AI work in production—live connectivity and replication across 350+ sources, semantic context, and built-in governance. Powering AI for Databricks, Microsoft, Google, Palantir, and 10,000+ customers worldwide.

Key Features

Sample Document

Free-Form Queries

Horizontal Flattening

FlattenArrays=0;FlattenObjects=False;

Result

FlattenArrays=0;FlattenObjects=True;

Result

FlattenArrays=2;FlattenObjects=False;

Result

FlattenArrays=1;FlattenObjects=True;

Result

Vertical Flattening

Custom Schema Definitions

Client-Side JSON Functions

JSON_EXTRACT

JSON_SUM

DOCUMENT

Get Started

Related Articles

Sign up

CData Community

Scanning file for viruses.

This file cannot be downloaded