Skip to main content

Install DataVirtuality JDBC driver

Windows

Download the latest DataVirtuality JDBC driver from here.
Copy the jar file to your spark install folder:

D:\spark-2.3.1-bin-hadoop2.6\jars

Linux

Download the latest DataVirtuality JDBC driver from here.
Copy the jar file to your spark install folder:

~/.local/lib/python3.11/site-packages/pyspark/jars/

Alternative solution using SparkConf

https://stackoverflow.com/questions/46925864/how-to-add-jdbc-drivers-to-classpath-when-using-pyspark

Enabling JDBC connection

In your DataVirtuality web Interface go to Preferences, and make sure that JDBC port is open, and write down the port number. 

10644145395613

10644139659677

Querying from PySpark in Jupyter Notebook

Now that we have set up the JDBC driver we can connect to it and query data from it.

Open a new Jupyter Notebook session and copy the following

Replace the highlights with your own credentials, and provide a working sql query.

NOTE:  use mm instead of mms in jdbc_url, if you are not using a SSL connection.

import findsparkfindspark.init()import pysparkfrom pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()from pyspark.sql import SparkSessionjdbc_url = "jdbc:datavirtuality:datavirtuality@mms://DVHostname:PortNumber;user=db_username;password=db_password"query = "(select * from hands_on.orders) as table1"df1 = spark.read.format('jdbc').options(driver = 'com.datavirtuality.dv.jdbc.Driver',url=jdbc_url, dbtable=query ).load()df1.show()

 

10644292186653

Be the first to reply!

Reply