Connect SAP Data Intelligence JupyterLab to SAP HANA, express edition
After installing SAP Data Intelligence on my SUSE CaaS Platform, I Check My Machine Learning Setup connecting to a Microsoft Azure Data Lake with the expected result:
But then I got my SAP HANA, express edition installation and would like to connect to that. As a starting point, I leverage tutorial Use JupyterLab with SAP HANA, express edition. Luckily, since on SAP Data Intelligence already, I can skip steps 1 – 6 and directly install the required Python modules:
With this I connect to my SAP HANA, express edition leveraging sqlachemy and count my tables:
Alternatively, I use directly the SAP HANA Python driver which follows the Python Database API Specification to consume data:
The advantage of using SAP Data Intelligence is that you can turn your JupyterLab notebook directly into an SAP Data Intelligence Data Pipeline:
For more details check out the openSAP course SAP Data Intelligence for Enterprise AI, especially Unit 9: Operationalizing Python and R with the Pipeline Modeler.
I’d greatly recommend using the hana-ml python package to simplify handling data in python from HANA. The hana-ml package implements the concept of a HANA dataframe, which is a virtual representation in python of a pandas-like dataframe whose data, however, sits physically on HANA. Any operations you do on the HANA Dataframe are pushed down to the HANA DB, so performance is always the best possible. Any transformations you do with the HANA dataframe, in python, can be either materialized or saved as a view, e.g. it’s just a virtual transformation. And for the situations where you do need to get an actual pandas df from the HANA tables & views, e.g. if you need to use some python lib based on pandas (like sklearn), it’s easily done with a .collect() on the hana df. Similarly you can convert a pandas df with the data sitting in python runtime to a hana df, in which case the data is saved into a HANA table.
There have been some great blogs about hana-ml: