Technical Articles
Your SAP on Azure – Part 22 – Read SAP HANA data from Azure Databricks
It has been a couple of weeks since I blogged last time, but I think I’m now on track with other projects and I’m happy to continue the SAP on Azure series! In the past few episodes, I focused on the integration of the SAP system with Azure data platform services. Today I will continue the topic and I would like to show you how to consume data stored in SAP HANA from the Azure Databricks. I know few people were interested in this topic so I’m happy I finally found time to write this short tutorial.
SAP HANA database can be accessed using the JDBC drivers. The Azure Databricks supports using external libraries to connect to external systems, so the entire process is very straightforward! The JDBC adapter for SAP HANA is part of the database client libraries and can be downloaded from the SAP Support Launchpad or the SAP Development Tools. We are interested in a small Java file ngdbc which we need to download – and then upload to Azure Databricks.
Once the file is downloaded we can publish it in the Azure Databricks library. Open the target workspace (you can choose your own or the shared one) and create a new library.
Drag and drop the previously downloaded file to the grey area of the screen. The library name will populate automatically and you can confirm library creation by clicking the Create button.
On the next screen select the cluster on which you wish to install the library.
The installation doesn’t take much time. In my case, it took around 1-2 minutes and afterward the status changed to Installed.
With the libraries installed we can check the connectivity to the database and read sample data. My Databricks cluster is deployed to the same virtual network as the SAP HANA database so I don’t need to create peering between vnets. The below script, written in the Scala language (thanks for good documentation!), reads and displays data stored in the SFLIGHT table:
%scala
import java.util.Properties
//Set connection parameters
val jdbcHostname = "<hostname>"
val jdbcPort = "<port>"
val jdbcDB = "<SID>"
val jdbcUser = "<Username>"
val jdbcPassword = "<Password>"
val driverClass = "com.sap.db.jdbc.Driver"
val jdbcUrl = s"jdbc:sap://${jdbcHostname}:${jdbcPort}"
//Check availability of the JDBC library to access SAP HANA
Class.forName(driverClass)
//Set connection properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUser}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)
//Read and display data
val sflight = spark.read.jdbc(jdbcUrl, "<SCHEMA.TABLE>", connectionProperties)
sflight.show()
It is also possible to fetch data through SAP HANA calculation view. I installed the SAP HANA Shine demo content that contains a set of tables and data models to practice and learn. In the below example I’m using the Purchase Order Worklist dashboard that underneath uses the PURCHASE_ORDER_WORKLIST calculation view.
I changed the command used in the previous example to fetch data from the calculation view:
//Read and display data
val po_worklist = spark.read.jdbc(jdbcUrl, "\"_SYS_BIC\".\"sap.hana.democontent.epm.models/PURCHASE_ORDER_WORKLIST\"", connectionProperties)
po_worklist.show()
The fetched data can be easily saved as a table in the Databricks:
//Read and save data
val po_worklist = spark.read.jdbc(jdbcUrl, "\"_SYS_BIC\".\"sap.hana.democontent.epm.models/PURCHASE_ORDER_WORKLIST\"", connectionProperties)
po_worklist.write.saveAsTable("PO_Worklist")
The Databricks automatically detects data schema and creates a table that can be displayed through the UI (and used with other queries).
That’s everything for today! I hope you like it and it’s helpful for you. I still want to write one more blog about using SAP HANA Spark Controller with the Azure Databricks and then we’ll jump to another topic.
Great write up!.
Is there any limitation as we have in ADF ( ODBC only can use used for HANA copy operation ) , thats its only meant out for READ operation or we can transform and load the data as well into HANA using JDBC. I intend to use this to club data from Azure SQL and Native HANA and load them untimately back to HANA. Is that feasible with this approach?
This is very interesting, thanks for sharing!
Have you by any chance tried to use the HANA python libraries (hdbcli & hana-ml) with PySpark?
In theory it should work, as you can do pip install for both libs I mentioned above.
Once you have a hana dataframe, you can delegate all data prep steps to HANA and just do a .collect() to gather the data in a pandas df that you can then use in your PySpark operations.
https://pypi.org/project/hdbcli/
https://pypi.org/project/hana-ml/
https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.04/en-US/f3b8fabf34324302b123297cdbe710f0.html
https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.04/en-US/index.html
Best regards,
Henrique.
Hi Bartosz,
I have one question,
does this code will work for SAP HANA Cloud instances?
I tried using hana instance in databricks and I am getting handshake_Failure error.
JDBCDriverException: SAP DBTech JDBC: Cannot connect to jdbc:sap://XXXXXXXX:443 [Cannot connect to host XXXXXXX:443 [Received fatal alert: handshake_failure], -813.]. Caused by: JDBCDriverException: SAP DBTech JDBC: SSL handshake failed : Received fatal alert: handshake_failure. Caused by: SSLHandshakeException: Received fatal alert: handshake_failure
my configuration is
%scala
import java.util.Properties
//Set connection parameters
val jdbcHostname = "XXXX.hana.canary-eu10.hanacloud.ondemand.com"
val jdbcPort = "443"
val jdbcDB = "XXXX"
val jdbcUser = "user"
val jdbcPassword = "<password>"
val driverClass = "com.sap.db.jdbc.Driver"
val jdbcUrl = s"jdbc:sap://${jdbcHostname}:${jdbcPort}"
//Check availability of the JDBC library to access SAP HANA
Class.forName(driverClass)
//Set connection properties
val connectionProperties = new Properties()
connectionProperties.put("user", s"${jdbcUser}")
connectionProperties.put("password", s"${jdbcPassword}")
connectionProperties.setProperty("Driver", driverClass)
//Read and display data
val sflight = spark.read.jdbc(jdbcUrl, "<schema.table>", connectionProperties)
//sflight.show()
can you please help here.
Hi Mayank,
Best to post this question as a question (answers.sap.com) and not as comment on the blog post.
Hi,
I agree with Denys van Kempen please post your question in the Q&A area, this is the fastest way to get a reply.
I've never tried connecting to the HANA Cloud. I can see you're using port 443 which doesn't seem to be correct. I'm not sure if HANA Cloud expose a database port, but if yes, then that should be the one to use.
Best regards
Bartosz
Hi, thanks for sharing this!
A question however: is it also possible to send a query (eg. SELECT * FROM <schema.table> WHERE <column> = <something>)? If so, what would be the correct code for this?
Best regards,
Maarten
Found the answer myself 🙂
Just format the query as a sub-query: (SELECT * FROM <schema.table> WHERE <column> = <something>) myTable
Cheers!
Thank you for sharing the blog. This is really helpful.
can you provide a sample code on how to call the HANA calculation view with an input parameter?
I have tried multiple ways to provide the input petameter but it doesn't work.
Thanks,
Rahul Doshi
Hi Rahul, did you try this format? (SELECT * FROM <schema.view>
WHERE <column> = <something>) myTable