Skip to Content
Technical Articles

Data Provisioning in HANA from Various Sources using Python

Why Python 

 

We have known that Smart Data Access (SDA) is much popular as a Data Provisioning method with SAP HANA to connect to various popular third party sources whether its SQL server, RDBMS etc. You can access data virtually as well as you can persist based on requirements. Flowgraphs plays an important role here.

But most of the times, in Real Life projects, we faced many requirements where there is a need to integrate HANA and do Data provisioning with the sources which are very specific and particular to any process/scenarios such as third party or custom APIs, Web pages, ODATA services, others. In these scenarios, we end up struggling to find ways on how to integrate HANA  with these sources as well as do the regular updates/processing of data.

Here, Python plays an important role as it brings lot of flexibility in integrating multiple sources along with powerful and easy to understand scripts. Also, provides lot of methods/functions to process your data.

 

Pre-requisites

 

In Python, we have multiple methods and so there is different need on what all things need to install.

Such as below

  • HANA client,
  • Python any version
  • Any development environment supported by Python such as PyCharm, Spyder, Jupyter Notebooks etc.
  • Source details depends upon from where you are loading the data. For example, if we are looking from APIs, then we need complete details like API urls, keys, parameters etc depend upon API nature.
  • HANA database with proper privileges

 

Methods in Python for HANA

 

To integrate different sources of data mentioned, Python provides various inbuilt methods for integration.

As per my experience, Some of them which are popular as mentioned below.

IMPORT DBAPI

IMPORT PYHDB

FROM SQLALCHEMY IMPORT CREATE_ENGINE

Here, in this blog, we will see these methods along with few example of short script. Also, I would share my experience with each methods.

To use all the methods, you need to install specific methods, if it is not there available by default under python dictionary.

 

DBAPI Method

 

For Using this method, we need to install HANA Client along with few changes with files and folder provided details below. Sample script below.

IMPORT DBAPI

This will import required methods while executing python scripts

CONNECTION = DBAPI.CONNECT(SERVER, PORT, USER, PASSWORD)

Here you need to provide HANA system details along with credentials, Try to have some background User and password which can be used for all jobs.

CURSOR = CONNECTION.CURSOR()

QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)

CURSOR. EXECUTE(QUERY)

Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements.

After above line execution, you can check, data will be inserted successfully into HANA tables.

For more details, please follow below link.

https://blogs.sap.com/2012/06/08/sap-hana-and-python-yes-sir/

 

PYHDB Method

 

This method is very much similar to DBAPI, only difference which I found using PYHDB is, it has more robust mechanism and also it provides good integration with HANA DB.

Syntax looks similar as DBAPI like below.

IMPORT REQUEST

IMPORT PYHDB

This will import required methods while executing python scripts

CONNECTION = PYHDB.CONNECT(HOST ,PORT ,USER NAME ,PASSWORD)

Here you need to provide HANA system details along with credentials. Try to have some background User and password which can be used for all jobs.

CURSOR = CONNECTION.CURSOR()

QUERY = INSERT INTO TABLE VALUES(‘INSERT TEST SUCCESSFUL ‘)

CURSOR. EXECUTE(QUERY)

Under Query variables, you can provide any SQL script which you would like to execute in HANA. you can use above statements to execute any DML/DDL statements

After above line execution, you can check, data can be inserted successfully into HANA tables.

As per my experience, both the above methods has some limitation due to which we faced couple of challenges while inserting data into HANA. Also it does not have much flexibility to provide more details about HANA like schema name etc. To achieve this, we need to go for some more custom scripts.

Also, these methods include too much SQL statements to process and insert data into HANA along with issues with bulk loads.

For more details, please follow below link.

https://github.com/SAP/PyHDB

 

FROM SQLALCHEMY IMPORT CREATE_ENGINE Method

 

As per my experience, this is the most robust method and also provides good experience while using it for multiple scenarios. It also uses the PYHDB method, which makes it more integrated. Below are few of the reasons to use this method.

  • You can make use of python inbuilt methods like Pandas, Dataframe which is very much effective and useful while dealing with structured data and also supports good volume of data loading
  • You can reduce unnecessary scripting
  • Flexibility in defining more details about HANA system like Schema name etc.
  • Flexibility in dealing with data in-discrepancies

Sample script below

IMPORT PANDAS AS PD

IMPORT REQUESTS

IMPORT JSON

FROM SQLALCHEMY IMPORT CREATE_ENGINE

IMPORT PYHDB

All of the above are required ones for this method as each supports different purpose

RESPONSE = << API Details need to provide like keys etc and capture the response >>

Above statement is very specific to source details, as per source details you need to capture the response and process accordingly and store in any variable like below ‘Table’

TABLE = RESPONSE.JSON()[‘TEST’]

Converting the response in JSON format

DF = PD.DATAFRAME(TABLE)

Using Pandas, Dataframe to make the source data in more readable format

CONNECTION = CREATE_ENGINE(‘HANA+PYHDB://'<<USERNAME>>:<<PASSWORD>>@<<HOSTNAME>>:<<PORT>>)

Provide parameters same as applicable for above other methods

DF.TO_SQL(‘<<HANATABLENAME>>’,CONNECTION,IF_EXISTS = ‘APPEND’,)

Passing Dataframe values to hana tables, here there is no need to write SQL like others, also gives you flexibility to define schemas and other details

For more details on syntax, please follow below link.

https://github.com/SAP/sqlalchemy-hana

https://stackoverflow.com/questions/44688029/moving-data-from-mysql-to-sap-hana-with-python

After above line execution, you can check, data can be inserted successfully into HANA tables.

 

Scheduling Python Scripts

 

In all Data integration scenarios, there is always a need to make the complete load process more robust and without manual intervention.

Python provides you flexibility to schedule its scripts using Windows task scheduler itself.

You can easily schedule any scripts using scheduler by creating Basic Task along with your program details, time etc and monitor also.

 

Summary

 

Python is very popular and powerful programming language. Using Python for Data Integration with HANA brings lot of flexibility to combine and extract information from various sources.

Its further enhancing the capability of SAP HANA and also gives you an environment to perform Advanced Analytics with the unexplored sources of data.

Currently, with SAP HANA 2.0, we are seeing lot of improvements in integration with python. This will be an added advantage with existing landscape.

 

 

Thanks & Regards,

Nitesh Verma

2 Comments
You must be Logged on to comment or reply to a post.
  • Hi Nitesh,

    Thank you for this post. You may want to try some recent changes to the official HANA driver as your “dbapi” instructions seem to use an old version. It is being actively developed and should provide what you get from pyhdb as well as more.

    To install:

    > pip install hdbcli

    To use:

    from hdbcli import dbapi

    and then you can use the same “dbapi.connect()” methods as above.

    The HANA SQLAlchemy module works with hdbcli automatically as well.

    Tom

    • Thanks Tom for your feedback and updating about the recent changes.

      We are aware about hdbcli but did not get a chance to use. We will definitely use it going forward.

      Thanks, Nitesh