Reading data from HANA – Apache NIFI & Python
I have been working in SAP Analytics such as BI-BW, S/4 HANA and Native HANA for past 6 years. I have recently started exploring Python and ETL tools used in Big Data environments, trying to learn their integration with HANA. Please find below blog post on encounters with Python and Apache NIFI 1.6 (windows edition)
- HANA Client should be installed to get you access to ODBC and JDBC drivers
- Python should be installed in your client system
- Apache NIFI should be installed in your client system
Note: Below 2 approaches, of using NIFI and Python, are independent of each other.
Connecting with Python:
I am using pyhdb module to connect with HANA system via ODBC connection.
Create a new python file in any of your favourite code editors/IDE, I am using PyCharm.
Below code will be used to establish a connection to HANA, enter relevant host, port, user id and password
Once you have established the connection you can use below code to fetch data. One thing I noticed, when you execute a SQL query via below code it brings the table headers as part of data (first tuple) followed by the actual content of the table. As a quick and dirty solution, I have opened another cursor querying the system table which stores the metadata for all the tables to get the table headers.
Finally the output looks like this :
This demonstration is about reading data from HANA using Apache. Apache NIFI is an open source tool used for extracting and Loading Data from one system to another. Here in our case we will extract data from HANA DB to a flat file in the client desktop.
Below is the flow diagram which I have used, this consists of 3 processor nodes that I have taken. You can explore more about NIFI by using following link: https://nifi.apache.org/
Processors are the building blocks in NIFI each dedicated for a specific task. You can connect them to create a logical data flow model.
We will begin with ExecuteSQL processor configuration, right click and select configuration. The data fetched from database by this processor is returned in avro file format and hence we will be using a converter to change it to JSON format.
Enter your SQL Query in below text box in the configuration window.
Create below controller service to establish a JDBC connection to your HANA Box using forward navigation arrow highlighted above in the red box. Make sure you have enabled this controller service before using it.
Finally in the PutFile processor enter the destination path for the flat file to be generated
Output as below:
Where do I find out the Database Connection URL from?
Also prerequisite says that I need HANA Client. But when I try to access the download page for that, it tells me that I have P user ID and that I need a SID. How to obtain this SID?
My problem has been resolved now:
HANA Client can be installed without login from here: SAP Development Tools (ondemand.com)
Database connection url can be obtained from "Copy SQL endpoint" as shown in Step 3 here: Choose an SAP HANA Database | Tutorials for SAP Developers