SAP HANA 2.0 SPS02 new feature: updated python driver
Introduction
As background: there have been two HANA python drivers available for some time:
- The python driver included with the SAP HANA Client package (hdbcli). The client package included a distribution of python 2.7 and the driver works only with that distribution.
- The python driver available on github (pyHDB). This driver is implemented in pure python, and worked with both python 2 and 3, but is not fully supported by SAP.
In SAP HANA SPS02, the hdbcli driver has been updated and now provides the following new features:
- support for python 2.7 and python 3.4+.
- installable into local python environments using the pip installer
- the driver is built on the same C++ native library used by ODBC, ADO.NET, and other interfaces. As a result it provides all the protocol support built into that library, including support for Active/Active database configurations, network compression, and more.
- improved support for calling stored procedures.
- fully supported as part of the SAP HANA Client distribution.
This blog post walks you through the usual Getting Started tasks, including install, establishing a connection, executing SQL DDL commands, and handling result sets.
Installing the python driver
There are two steps to installing the python driver:
- Install the SAP HANA Client package on your computer. For instructions on this step, see the previous blog post.
- Install the python driver from the HANA Client install into a local python environment.
Installing the python driver into a local python environment
You must have a python distribution installed on your local machine. Some python distributions allow you to create multiple independent virtual environments, each with its own version of python and its own set of packages. You can install the python driver into any local python environment.
From a command line:
- If you have multiple python environments, activate the python environment into which you wish to install the python driver. If you don’t have multiple python environments, just ignore this step.
- Confirm you have a supported python version (2.7, or 3.4+) by typing the following at a command prompt to show the python version:
- python -V
- Install the HANA Client package into your python environments using the pip installer.
- Here is a command for the default installation on Windows. The particular version of the zip file will probably be different:
- pip install “C:\Program Files\SAP\hdbclient\hdbcli-2.N.N.zip”
- Here is a command for the default installation on Linux.
- pip install /full/path/to/file/hdbcli-2.N.N.tar.gz
- Here is a command for the default installation on Windows. The particular version of the zip file will probably be different:
Now confirm you have the software properly installed. From a command prompt, type python to start a python interpreter. Then enter the following to confirm the package is properly installed:
>>> from hdbcli import dbapi
>>> dbapi.__name__
'hdbcli.dbapi'
All the code samples here are shown as if executed from a python interpreter.
Connecting to a database
Here is sample python code showing a database connection:
>>>from hdbcli import dbapi
>>>conn = dbapi.connect(
address="<hostname>",
port=<3NNMM>,
user="<username>",
password="<password>"
)
- For HANA tenant databases, you can use the port number 3NN13 (where NN is the SAP instance number).
- For HANA system databases in a multitenant system, the port number is 3NN13.
- For HANA single-tenant databases, the port number is 3NN15.
For more details about HANA port number assignments in special circumstances such as upgrades, see the documentation.
For details of connection class methods, see <SAP Documentation Link>
Executing SQL statements
All SQL statements are executed using the cursor object. Here is how to execute a single statement that does not return a result set:
>>> cursor = conn.cursor()
>>> cursor.execute("CREATE TABLE T1 (ID INTEGER PRIMARY KEY, C2 VARCHAR(255))")
>>> cursor.close()
Binding parameters for SQL statements
For statements that take a parameter, it is good practice to bind the parameter to a variable rather than to concatenate a string and execute it. The HANA client driver supports both the question mark and named styles of parameter binding as described in the DBAPI specification.
Question mark parameter binding
Here is an insert statement using the question mark parameter style. The statement inserts values into the T1 table created above. The parameter values are supplied as a python sequence, and may be literal values or variable names. This example uses literal values:
>>> sql = 'INSERT INTO T1 (ID, C2) VALUES (?, ?)'
>>> cursor = conn.cursor()
>>> cursor.execute(sql, (1, 'hello'))
True
>>> cursor.execute(sql, (2, 'hello again'))
True
>>> cursor.close()
Named parameter binding
Here is another insert using the named parameter style. This time the values are supplied as a python dictionary, and in this case the example uses variable names.
>>> sql = 'INSERT INTO T1 (ID, C2) VALUES (:id, :c2)'
>>> cursor = conn.cursor()
>>> id = 3
>>> c2 = "goodbye"
>>> cursor.execute(sql, {"id": id, "c2": c2})
True
>>> cursor.close()
Querying data and handling result sets
There are several ways of handling result sets, useful in different circumstances. Here is a simple way to loop over the rows of the result set.
>>> sql = 'SELECT * FROM T1'
>>> cursor = conn.cursor()
>>> cursor.execute(sql)
True
>>> for row in cursor:
... print(row)
(1, 'hello')
(2, 'hello again')
If you know the result will have a single row, then cursor.fetchone() is useful
>>> id = 2
>>> sql = 'SELECT * FROM T1 WHERE ID = :id'
>>> cursor = conn.cursor()
>>> cursor.execute(sql, {"id": id})
True
>>> row = cursor.fetchone()
>>> print(row)
(2, 'hello again')
If the result set is not too large, you may want to read it into a python variable in one step using cursor.fetchall().
>>> sql = 'SELECT * FROM T1'
>>> cursor = conn.cursor()
>>> cursor.execute(sql)
True
>>> result_set = cursor.fetchall()
>>> print(result_set)
[(1, 'hello'), (2, 'hello again'), (3, 'goodbye')]
Calling stored procedures
You can get results from stored procedures in two ways: through output (or inout) parameters, or through returned results. In both cases you can use the cursor.callproc() method.
Here is an example of how to return an output parameter.
>>> cursor = conn.cursor()
>>> # if the procedure exists, drop it
>>> try:
... cursor.execute("DROP PROCEDURE ADD2")
... except:
... pass
>>> # create the procedure
>>> sql_proc = """
... create procedure ADD2 (in a int, in b int, out c int)
... language sqlscript
... reads sql data as
... begin
... c := :a + :b;
... end
... """
>>> cursor.execute(sql_proc)
True
>>> # call the procedure
>>> params_in = (2, 5, None)
>>> params_out = cursor.callproc("ADD2", params_in)
>>> # close the cursor
>>> cursor.close()
>>> print(params_out)
(2, 5, 7)
… and here is an example of handling returned result sets. The cursor.callproc()
method acts just like cursor.execute("SELECT...")
.
>>> cursor = conn.cursor()
>>> # if the procedure exists, drop it
>>> try:
... cursor.execute("DROP PROCEDURE PROC_T1")
... except:
... pass
>>> # create the procedure
>>> sql_proc = """
... CREATE PROCEDURE PROC_T1(IN substring varchar(10))
... LANGUAGE SQLSCRIPT
... AS
... BEGIN
... SELECT ID, C2 FROM T1
... WHERE C2 LIKE '%' || :substring || '%';
... END
... """
>>> cursor.execute(sql_proc)
True
>>> # call the procedure
>>> cursor.callproc("PROC_T1", ("hello",))
>>> for row in cursor:
... print(row)
(1, 'hello')
(2, 'hello again')
Streaming data
There is a separate class for streaming data to or from SAP HANA. You only need to use this class if you wish to process data in a piecewise fashion. Typically this may be used for video or speech data, but here is a sample that uses an image.
# get an image from the file system
with open('image.png', 'rb') as f:
img = f.read()
# write to the database
in_blob = dbapi.LOB()
id = 2
conn.setautocommit(False)
cursor.execute("update T1 set PHOTO = :img WHERE id =:id", {"img": in_blob, "id": id})
in_blob.write(data=img)
in_blob.close()
conn.commit()
cursor.close()
And here is an example of how to read the image back out of the database. The important part is the use of the read() method to fetch a portion of the blob at a time.
# read it into a stream
conn.setautocommit(True)
cursor = conn.cursor()
cursor.execute("select PHOTO from T1 where id = :id", {"id": id})
out_blob = cursor.fetchone(True)[0]
out_img = bytearray()
while True:
data = out_blob.read(size=200)
if data is None:
print("Done.")
break
else:
print("Reading more data...")
out_img.extend(data.tobytes())
This completes the description of the new HANA python driver.
How can I load a bunch of data?
I mean, suppose that I have a CSV file with 10 million rows, do I have to read and insert them line by line?
Do this python driver has a bulk load or something similar?
Thanks in advance.
Sorry for the slow response. I think the best way is to use the IMPORT FROM statement, which you can do from the python interface or any other.
This works if you have the file loaded in the server, but I think it doesn't works if you have the file in your local machine.
Thanks a lot for the response.
Carlos Mendez
Single read from csv and you can try executeMany() but for millions it's not optimal way, have to divide and execute in batches.
Hi Carlos Mendez , Nagavenkata Mittapalli did you try the executeMany() i am trying to use it with python fstring literals and always getting some syntax error. Is there a preference b/w % Operator or the fstring in this cases.
I had to add this to get it to work. But perhaps putting that directory on the path would work also.
import sys
sys.path.append('/hana/shared/[xxx]/hdbclient/')
from hdbcli import dbapi
Good Job! the previous solution was to put all the associated module files into the class directory, which might not be correct...
Hi Tom,
Thanks for this article. It explains the usage quite well.
However, I have a query, as per the documentation, https://help.sap.com/doc/0eec0d68141541d1b07893a39944924e/2.0.02/en-US/ee592e89dcce4480a99571a4ae7a702f.html , we can connect to HANA with userstore key but the functionality doesnt seem to be working. Can you please guide or give us a hint if its available?
Hi Ajay, It does work for me, using dbapi.connect(key="key_name"). It does matter that the python process is running in a user account that has access to the keystore.
Thanks, Great Blog!!!
I would like to add one point here, The result set that you get out of fetchone() or fetchall() is a multidimensional Python list in case your result set has multiple Columns.
you may use the below code to extract & use the result set as per the use case.
Python version Python 3.7.0
For single column result set, its one dimensional list.
Cheers!!
Razal
I have HANA client tools installed and ran command pip install "C:\Program Files\SAP\hdbclient\hdbcli-2.2.76.zip" finished successfully. I have python version 3.7.1 installed. On Windows 10. When I enter
I get
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'hdbcli'
Can you think of anything I missing?
Best Regards
Kevin Geiger
Python 3.7 support was missed in the first release. It will be available very soon.
Hi,
i installed the hdbclient and the hdbcli 2.3.130 inside it to the local python (3.72)
When i run
I get this error
Did I miss anything?
Best regards,
Dang Le
Hi Dang Le,
We had a limitation until very recently that python 3.7 was not supported properly. If you get HANA Client 2.3.130 or later it should all work fine.
HANA Client 2.3.130 is available on SAP Software Downloads as IMDB_CLIENT20_003_130-80002083.SAR, or as part of HANA Platform Edition 2.0 SPS03 Rev35. It's not yet available on https://tools.hana.ondemand.com#hanatools, but it should be before too long.
Hi Tom,
The python version is i am using is 3.6.5 and i have installed the latest HANA Client as well from the link, still i am facing the same error as mentioned by Dang Le. Kindly suggest if something else needs to be checked, any help is welcomed.
Regards,
Deo
Getting errors while installing pip.exe install "C:\Program Files\SAP\hdbclient\hdbcli-2.3.134.zip"
Installing collected packages: hdbcli
Running setup.py install for hdbcli
error: Compatible extension not found.
running build_ext
error: Compatible extension not found.
--compile failed with error code 1
Python version 2.7.9 32-bit. Any suggestions?
Thank you in advance
Hi Maksim,
It looks like you are trying to install the 64-bit driver into a 32-bit python install. We will have better error messages in the future, but you will need to either get the 32-bit hdbclient install or a 64-bit python.
I have python 3.7.3 on my system. While installing HDBclient 2.3.144 on Windows , i am getting this error :
copying hdbcli\resultrow.py -> build\lib.win32-3.7\hdbcli
copying hdbcli\__init__.py -> build\lib.win32-3.7\hdbcli
running build_ext
error: A compatible abi3 driver could not be found.
Can anyone please help
Can you tell me if you are using the 32-bit or 64-bit version of python?
I just installed exactly this combination and it worked for me: Anaconda python 3.7.3 and SAP HANA driver 2.3.144 on Windows. I changed to the HANA Client install directory and ran "pip install hdbcli-2.3.144.zip"
i was using 32-bit version . I installed 64 bit and it worked fine.
Thanks!
Hi Tom,
Your blog works fine on a local machine (MacOS, python 3.7.2, pip 19.1, hdbcli 2.4.126), but when pushing it to cloud foundry in SAP Cloud Platform, the installation fails. See cf push output below.
To push the hdbcli lib to cf I used the 'hdbcli-2.4.126.tar.gz' file I found in my installed '/Applications/sap/hdbclient' folder. Not sure if this is the correct file as this is a zip coming from the Mac installation which I'm pushing to a Linux cf environment?
Thanks
Willem
Solved by using the .whl file instead of the .tar.gz file (hdbcli-2.4.126-cp34-abi3-linux_x86_64.whl)
Hi,
What is the most efficient way to read the result from cursor if we have around 10 million rows? cursor.fetchall() takes a really long time to fetch from a table with more than 2 million rows.
Thanks,
Pankti
Here's a Docker Image running Jupyter with the libs pre-configured for those of you just wanting to play around quickly and don't want to have fits with install/config issues. https://github.com/entmike/hana-jupyter-docker
Hi,
I want to know how connection pooling works with hdbcli, i did not see any configuration for specifying max connection pools etc. How are connections maintained if there are say 1000s requests/sec.
Just add on.
The blog doesn't mention about the usage of executemany(), following is the sample.
Note that in mysql/postgresql, we may have to use %s, %n, %d to represent the parameters with certain datatype, and hdbcli only use '?'
Hope this help