Technical Articles
Data Provisioning in HANA from Various Sources using Python
Why Python
We have known that Smart Data Access (SDA) is much popular as a Data Provisioning method with SAP HANA to connect to various popular third party sources whether its SQL server, RDBMS etc. You can access data virtually as well as you can persist based on requirements. Flowgraphs plays an important role here.
But most of the times, in Real Life projects, we faced many requirements where there is a need to integrate HANA and do Data provisioning with the sources which are very specific and particular to any process/scenarios such as third party or custom APIs, Web pages, ODATA services, others. In these scenarios, we end up struggling to find ways on how to integrate HANA with these sources as well as do the regular updates/processing of data.
Here, Python plays an important role as it brings lot of flexibility in integrating multiple sources along with powerful and easy to understand scripts. Also, provides lot of methods/functions to process your data.
Pre-requisites
In Python, we have multiple methods and so there is different need on what all things need to install.
Such as below
- HANA client,
- Python any version
- Any development environment supported by Python such as PyCharm, Spyder, Jupyter Notebooks etc.
- Source details depends upon from where you are loading the data. For example, if we are looking from APIs, then we need complete details like API urls, keys, parameters etc depend upon API nature.
- HANA database with proper privileges
Methods in Python for HANA
To integrate different sources of data mentioned, Python provides various inbuilt methods for integration.
As per my experience, Some of them which are popular as mentioned below.
IMPORT DBAPI
IMPORT PYHDB
FROM SQLALCHEMY IMPORT CREATE_ENGINE
Here, in this blog, we will see these methods along with few example of short script. Also, I would share my experience with each methods.
To use all the methods, you need to install specific methods, if it is not there available by default under python dictionary.
DBAPI Method
For Using this method, we need to install HANA Client along with few changes with files and folder provided details below. Sample script below.
IMPORT DBAPI
This will import required methods while executing python scripts
CONNECTION = DBAPI.CONNECT(SERVER, PORT, USER, PASSWORD)
Here you need to provide HANA system details along with credentials, Try to have some background User and password which can be used for all jobs.
CURSOR = CONNECTION.CURSOR()
QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)
CURSOR. EXECUTE(QUERY)
Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements.
After above line execution, you can check, data will be inserted successfully into HANA tables.
For more details, please follow below link.
https://blogs.sap.com/2012/06/08/sap-hana-and-python-yes-sir/
PYHDB Method
This method is very much similar to DBAPI, only difference which I found using PYHDB is, it has more robust mechanism and also it provides good integration with HANA DB.
Syntax looks similar as DBAPI like below.
IMPORT REQUEST
IMPORT PYHDB
This will import required methods while executing python scripts
CONNECTION = PYHDB.CONNECT(HOST ,PORT ,USER NAME ,PASSWORD)
Here you need to provide HANA system details along with credentials. Try to have some background User and password which can be used for all jobs.
CURSOR = CONNECTION.CURSOR()
QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)
CURSOR. EXECUTE(QUERY)
Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements
After above line execution, you can check, data can be inserted successfully into HANA tables.
As per my experience, both the above methods has some limitation due to which we faced couple of challenges while inserting data into HANA. Also it does not have much flexibility to provide more details about HANA like schema name etc. To achieve this, we need to go for some more custom scripts.
Also, these methods include too much SQL statements to process and insert data into HANA along with issues with bulk loads.
For more details, please follow below link.
FROM SQLALCHEMY IMPORT CREATE_ENGINE Method
As per my experience, this is the most robust method and also provides good experience while using it for multiple scenarios. It also uses the PYHDB method, which makes it more integrated. Below are few of the reasons to use this method.
- You can make use of python inbuilt methods like Pandas, Dataframe which is very much effective and useful while dealing with structured data and also supports good volume of data loading
- You can reduce unnecessary scripting
- Flexibility in defining more details about HANA system like Schema name etc.
- Flexibility in dealing with data in-discrepancies
Sample script below
IMPORT PANDAS AS PD
IMPORT REQUESTS
IMPORT JSON
FROM SQLALCHEMY IMPORT CREATE_ENGINE
IMPORT PYHDB
All of the above are required ones for this method as each supports different purpose
RESPONSE = << API Details need to provide like keys etc and capture the response >>
Above statement is very specific to source details, as per source details you need to capture the response and process accordingly and store in any variable like below ‘Table’
TABLE = RESPONSE.JSON()[‘TEST’]
Converting the response in JSON format
DF = PD.DATAFRAME(TABLE)
Using Pandas, Dataframe to make the source data in more readable format
CONNECTION = CREATE_ENGINE(‘HANA+PYHDB://'<<USERNAME>>:<<PASSWORD>>@<<HOSTNAME>>:<<PORT>>)
Provide parameters same as applicable for above other methods
DF.TO_SQL(‘<<HANATABLENAME>>’,CONNECTION,IF_EXISTS = ‘APPEND’,)
Passing Dataframe values to hana tables, here there is no need to write SQL like others, also gives you flexibility to define schemas and other details
For more details on syntax, please follow below link.
https://github.com/SAP/sqlalchemy-hana
https://stackoverflow.com/questions/44688029/moving-data-from-mysql-to-sap-hana-with-python
After above line execution, you can check, data can be inserted successfully into HANA tables.
Scheduling Python Scripts
In all Data integration scenarios, there is always a need to make the complete load process more robust and without manual intervention.
Python provides you flexibility to schedule its scripts using Windows task scheduler itself.
You can easily schedule any scripts using scheduler by creating Basic Task along with your program details, time etc and monitor also.
Summary
Python is very popular and powerful programming language. Using Python for Data Integration with HANA brings lot of flexibility to combine and extract information from various sources.
Its further enhancing the capability of SAP HANA and also gives you an environment to perform Advanced Analytics with the unexplored sources of data.
Currently, with SAP HANA 2.0, we are seeing lot of improvements in integration with python. This will be an added advantage with existing landscape.
Thanks & Regards,
Nitesh Verma
Hi Nitesh,
Thank you for this post. You may want to try some recent changes to the official HANA driver as your "dbapi" instructions seem to use an old version. It is being actively developed and should provide what you get from pyhdb as well as more.
To install:
To use:
and then you can use the same "dbapi.connect()" methods as above.
The HANA SQLAlchemy module works with hdbcli automatically as well.
Tom
Thanks Tom for your feedback and updating about the recent changes.
We are aware about hdbcli but did not get a chance to use. We will definitely use it going forward.
Thanks, Nitesh