Federating queries to Databricks from SAP Datasphere for real-time analytics in SAP Analytics Cloud
For many companies, data strategy may involve storing business data in independent silos at different repositories. Some of that data may even cross different cloud sources (for cost and other reasons) which brings along new challenges with data fragmentation, data duplication and loss of data context. SAP Datasphere helps bridge siloed and cross cloud SAP and non-SAP data sources enabling businesses to get richer business insights, all while keeping the data at its original location and eliminating the need to duplicate data and time consuming ETLs.
Databricks Lakehouse is a popular cloud data platform that is used for housing business, operational, and historical data in its delta lakes and data lake houses.
In this blog, let’s see how to do unified analytics on SAP Analytics Cloud by creating unified business models that combine federated non-SAP data from Databricks with SAP business data to derive real-time business insights.
The integration of Databricks and SAP BTP can be summarized in five simple steps:
Step1: Identify the source delta lake data in Databricks:
Step2: Prepare to connect Databricks to SAP Datasphere.
Step3: Connect Databricks as a source in SAP Datasphere connections.
Step4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model.
STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that help illustrate quick business insights.
STEP1: Identify the source delta lake data in Databricks.
- For this blog, we will federate IoT data from Databricks delta lake and combine it with product master data from SAP sources.
STEP 2: Prepare to connect Databricks to SAP Datasphere.
- Go to your Databricks SQL Warehouse, Connection details tab as shown below and copy the jdbc url.
2. Go to User settings–>Generate New Token, Copy & note the token.
3. Rewrite the above JDBC string that we got in Step1, removing the uid and PWD parameters and adding the 2 new as shown below (IgnoreTransactions and UseNativeQuery)
STEP 3 : Connect Databricks as a source in SAP Datasphere:
Pre-Requisites: Data Provisioning Agent is installed and connected to SAP Datasphere. Make sure the DP Agent system can talk to the Databricks cluster.
- Download the latest Databricks jdbc driver copied to camel/lib directory .
- Restart the DP agent.
- Make sure CamelJDBCAdapter is registered and turned on in SAP Datasphere by following this help.
- In DWC Connections create a Generic JDBC connection and enter the details as shown below filling in the jdbc url we formed earlier.
Username : token
Password: <use the token we copied earlier from databricks user settings>
5. Create a remote table in SAP Datasphere databuilder for a Databricks table and preview to check if data loads.
STEP 4: Create Analytical dataset in SAP Datasphere to join live SAP and non-SAP(Databricks) data into one unified semantic model .
You can see the live query push downs happening at the Databricks compute cluster from the Log4j logs when data is previewed in SAP Datasphere models.
STEP 5: Connect to this Analytical unified data model live from SAP Analytics Cloud and create visualizations that illustrate quick business insights.
For example, the dashboard below shows real time truck and shipment status for customer shipments. The live IoT data from Databricks delta lake that holds the real-time truck data is federated and combined with customer and shipment master data from SAP systems into a unified model used for efficient and real-time analytics.
We hope this quick tutorial helps you in your data journeys and exploring the exciting new features available in SAP Datasphere. We’d love to get your thoughts & opinions. So please leave us a comment below. And don’t forget to give us a like too if you found this blog especially useful! Thanks for reading!
Please read our next blog here to learn about how FedML-Databricks library can be used to federate live data from SAP Datasphere’s unified semantic data models for doing machine learning on Databricks platform.
Many thanks to Databricks team for their support and collaboration in validating this architecture – Itai Weiss, Awez Syed, Qi Su, Felix Mutzl and Catherine Fan. Thanks to SAP team members, for their contribution towards this architecture – Akash Amarendra, Karishma Kapur, Ran Bian, Sandesh Shinde, and to Sivakumar N and Anirban Majumdar for support and guidance.
For more information about this topic or to ask a question, please contact us at firstname.lastname@example.org