Skip to Content
Technical Articles
Author's profile photo Bartosz Jarkowski

Your SAP on Azure – Part 19 – Expose Azure Databricks to SAP HANA using ODBC connection

The use cases for the Hadoop and SAP HANA are different, but in some cases, it is recommended or even required or to connect both worlds. The Hadoop clusters operate on very large datasets that consist of unstructured data while SAP HANA is focused on analytical processing. An ETL process that would replicate the Hadoop data is not always a good solution as it could dramatically increase the database memory requirements. But a smart usage of both technologies benefits from lower storage cost and high-speed in-memory processing power at the same time.

There are a few ways to expose data coming from a Hadoop cluster to the SAP HANA database. Some time ago I wrote two blog posts where I describe how to use SAP Vora and the Spark Extension to query the data from the Azure HDInsight and from the Azure Databricks in SAP HANA. But the SAP Vora is not required. SAP HANA can be directly connected to the Hadoop cluster using just ODBC drivers or the HANA Spark Controller. Today I present the first way, but don’t worry, a blog about the Spark Controller is on the way!

In previous episode I have already presented how to deploy the Azure Databricks cluster, so this time I jump directly to the configuration part. If you don’t have a cluster already, I’d recommend reading the Part 17 firstly.


I use the unixODBC as the Driver Manager. It was not installed by default on my server, so I use Yast to add the missing component:

The Simba ODBC drivers that I’m going to use today are available on the Databricks website. To access the download link you’ll have to provide your details like name or e-mail address. After around 10 minutes I had a new e-mail in my Inbox.

The SAP HANA database runs on Linux environment. I downloaded the 64-bit version and extracted the archive.

cd SimbaSparkODBC-

The RPM installation is also straightforward

rpm -ivh simbaspark-

By default the package is extracted to the /opt/simba/spark directory:

The driver library is located in the /opt/simba/spark/lib/64


To configure the drivers we’ll need an input from the Databricks cluster. It exposes the ODBC protocol and the connection details are available under Clusters -> Configuration -> ODBC

To establish the connection we’ll also require a personal token that can be generated in the Databricks dashboard. Click on the User Settings under the profile icon in the top right corner:

Click on Generate New Token and choose a validity period. The token will be displayed on the screen – remember to copy it as it is not possible to display it again.

The ODBC configuration relays on two text files that should be kept in the home directory of the <sid>adm user. The first one can be copied from the driver location and stores the driver settings. There is no need to change anything inside.

su - <sid>adm
cd $HOME
cp /opt/simba/spark/lib/64/simba.sparkodbc.ini .simba.sparkodbc.ini

The second file keeps the connection information. Create the .odbc.ini file with following content:

Server=<Server Hostname from the Databricks ODBC settings> 
HOST=<Server Hostname from the Databricks ODBC settings>
HTTPPath=<HTTP Path from the Databricks ODBC settings>

The unixODBC uses the system variables to identify configuration files. The easiest way to modify them is to create the file located in the home directory, that will be automatically executed during the database user login.

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/simba/spark/lib/64
export ODBCINI=$HOME/.odbc.ini
export SIMBASPARKINI=$HOME/.simba.sparkodbc.ini

Log out and log in again to check the current unixODBC configuration. User data sources should point to the .odbc.ini file that we created.

odbcinst -j


I think this is a good moment to stop for a while and check if the connection to the Databricks works. Sometimes the unixODBC can be tricky so I want to ensure I configured it correctly before I do anything in SAP HANA. But to test the connection we need some data in the Databricks tables.

There are two ways to create a table in the Azure Databricks. You can either convert a DataFrame into a table or use the user interface to import a new file using a browser.

In the previous post I presented how to import data residing in the Azure Data Lake into a DataFrame. We can just add one line of code to save it as a Table:

val df ="inferSchema", true).csv("dbfs:/mnt/DemoDatasets/airports.dat")

The alternative way is also very easy. In the Databricks dashboard click on Data in the left menu and Select Add Data. On the screen below you can also see the table I just created using the script.:

The dataset I’m using is available here:

Now you can just drag and drop the file to the browser window and adjust parameters like table name or table headers:

Now, when we have some data, we can query the Databricks cluster using the ODBC drivers.

isql <DSN> <username> <password>

If you can see the SQL command prompt it means the connection is established successfully and we can run a sample query:

select airport_name, city from airports_dat limit 10

The connection is working fine and we can continue with the HANA configuration.


Today’s post goal is to access the Databricks cluster from SAP HANA database. We have already verified that the connection is working properly, so the last task is to create a new Remote Source and reference the Databricks table as a Virtual Tables.

Create a new Remote Source and provide the ODBC DSN together with user credentials:

Now you can expand the Remote Sources to see the available tables in the Azure Databricks:

Create a virtual table:

Now you can query it as it was a part of the SAP HANA database

Assigned Tags

      You must be Logged on to comment or reply to a post.
      Author's profile photo Chris Combe
      Chris Combe

      Thanks for sharing this - very timely (as you know :))


      Are you able to demonstrate how this works if you establish your data source using Azure Databricks using Delta instead of a standard Parquet approach as I believe it should be quite similar but yield much better performance and efficiency.


      Also do you have any guidance for how complex a data model / data sourcing you can support from your source data and the influence it may have on cost of Azure Databricks running e.g. would this be good for running your spark cluster 24x7 or more to cache your data into a BW for your reporting and then using something like Databricks Delta to append the updates (either batch or streaming) into the BW/4HANA stack.

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author

      Hello Chris,

      I run a test and I was able to expose the Delta tables using ODBC drivers.

      The challenge I see in your approach is that it's very close to a regular ETL process. The Smart Data Access is designed to work with on-line data sources to provide (near) real-time data without the need of copying them.

      I'm not an expert in BW, so not really sure what options do you have to work with the cache, but if you want to keep the data in BW and update it with changes, then what's the advantage of this solution over a regular ETL?

      To access the data on-line you'd require the Databricks cluster running. But if you could use a set of small VM, it doesn't neccessarily has to be expensive, especially when comparing to a price of HANA VM with additional memory. I'd also check the current availability of the Databricks cluster - maybe it's already running most of the time. If you're able to analyze at what time users access the data, maybe you could schedule the automatic start and stop based on time?

      Author's profile photo Shubra Bhattacharyya
      Shubra Bhattacharyya


      Thanks for this wonderful blog. Extremely detailed and easy to follow.

      Can you also share some insight on how can we connect to Azure Databricks from an on-premise HANA DB?

      Are there some options of connecting via a proxy host or connect via some gateway system like SAP Cloud Connector or Azure Data Management Gateway?

      Thanks and Regards



      Author's profile photo Florian Kryst
      Florian Kryst

      Great article! Any thought when you are planning to release the article on Spark Controller integration with Databricks?

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author


      probably couple of weeks. I'll try to write it in October. Next one I plan to release is to connect from Databricks to HANA using JDBC.



      Author's profile photo Mayank Gupta
      Mayank Gupta

      Hi Bartosz,

      Have you written any blog for connecting databricks to hana using jdbc ?

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author


      there is a couple of different scenarios I covered, so have a look at my blogs:

      For example:

      Author's profile photo Pan Aaron
      Pan Aaron

      Great article,  Bartosz!

      If we use a standalone DPagent on windows server, I assume we just need to install the windows odbc driver and configuration the DSN there. right?

      Also, have you compared the performance between odbc and spark controller?

      Which connection type is recommended?





      Author's profile photo Stefan Geiselhart
      Stefan Geiselhart

      Hi Bartosz,

      thx for your great blog.

      Did you also consider using Databricks JDBC driver on top of SDI CamelAdapter? If yes, was there any K.O. criteria, why you didn't try that one?

      Best regards Stefan

      Author's profile photo Steffen Mezger
      Steffen Mezger

      Thank you for this detailed article.

      can you confirm that the connection is still working with the latest drivers and hana version? (SLES Linux)
      SAP HANA 2.0 SPS05

      In my installation, the indexserver is crashing when i try to expand the Remote Sources to see the available tables

      Author's profile photo Bartosz Jarkowski
      Bartosz Jarkowski
      Blog Post Author


      I no longer have the lab ready, so I can't test it quickly. But have you seen this SAP note?

      2418485 - HANA indexserver crashes from Simba ODBC driver

      There is a couple other notes you could check as well.

      Author's profile photo Thorsten Voss
      Thorsten Voss

      Hi Steffen,

      can you please share an update if you found a soltion to your problem?

      Thank you very much!

      Author's profile photo Steffen Mezger
      Steffen Mezger

      Hi Thorsten, there was no way with SDA.
      We get it working only with SDI

      1. Register Camel Adapter
      2. Install Java Drivers in SDI Camel for Databricks (see Datasbricks JDBC manual)
      3. Edit jdbc-spring-config.xml and remove line “<property name="defaultAutoCommit" value="false" />”
      4. Edit and set delimident=BACKTICK
      5. Restart SDI, Create new Remote Datasource with this Adapter to Databricks.

      Regards, Steffen

      Author's profile photo Abhishek Singh
      Abhishek Singh

      Hi Steffen,

      I've followed all the steps outlined above but getting this error:


      Author's profile photo Abhishek Singh
      Abhishek Singh

      Followed instructions from

      that error has gone away but now getting another error:


      [86183]{224366}[84/-1] 2023-11-06 15:27:57.654035 e FedTrace DPAdapterAccess.cpp(02413) : DPAdapterAccess::Connect: failed with error: Failed to create route route6 at: >>> Choice[[When[simple{${header.DpCommand} == 'browse' or ${header.DpCommand} == 'import'} -> [SetHeader[DpMetadataBrowser, simple{jdbc}], To[dpmetadata:dataSource]]], When[simple{${header.DpCommand} == 'query' or ${header.DpCommand} == 'update'} -> [Choice[[When[simple{${header.DpCommand} == 'query'} -> [To[jdbc:dataSource?useHeadersAsParameters=true&resetAutoCommit=false&outputType=StreamList], SetHeader[DpResultType, simple{map}]]], When[simple{${header.DpCommand} == 'update'} -> [To[jdbc:dataSource?useHeadersAsParameters=true&resetAutoCommit=true], SetHeader[DpUpdateCount, simple{${in.header.CamelJdbcUpdateCount}}]]]]], To[dpresult:dpresult]]]] Otherwise[[Stop]]] <<< in route: Route(route6)[From[direct:sda] -> [Choice[[When[simple{${hea... because of Unexpected token o at location 32
      ${header.DpCommand} == 'browse' or ${header.DpCommand} == 'import'