Skip to Content
Technical Articles
Author's profile photo Jan Fetzer

Connecting to SAP Data Warehouse Cloud from Python via hdbcli, sqlalchemy & hana_ml

I’d like to use this blog post to quickly run over the various methods for connecting to SAP Data Warehouse Cloud from Python,e .g. from Jupyter notebook. Since SAP Data Warehouse Cloud sits on top of SAP HANA Cloud and allows you to connect to it via Read Access & Write Access, this altogether boils down to connecting to HANA Cloud from Python. These standard ways exist:

  • hdbcli – standard Python client for SAP HANA, available on PyPi
  • SQLAlchemy is a Python SQL toolkit and object-relational mapper. By adding the dialect sqlalchemy-hana to it, you can connect to HANA in a very convenient manner and e.g. read HANA tables as Pandas DataFrames or write Pandas DataFrames easily back to SAP HANA. It is not officially endorsed by SAP, but still works like a charm in most situations.
  • hana_ml is the Python machine learning client for SAP HANA. It is geared towards data scientists and easily makes HANA’s ML functionalities available to a Python-savvy audience, but any Python fan can use it and leverage its way to easily access HANA data.

Note that in this blog post I’ll only explore the first two, because for hana_ml, excellent material is already available in hana-ml-samples@GitHub. For connectivity, just open any of the notebooks published in that repo.

Setting up a Database User in DWC

  1. Open Space Management of SAP Data Warehouse Cloud and find your space (here: PYTHONSAMPLES). Scroll down and start creating your database user for data consumption as well as data ingestion.
  2. You specify the user name suffix (here PYTHON)
  3. Next tell the system, if you want to enable read access directly from the space schema (same name as space, i.e. PYTHONSAMPLES in my case). With this, your database user will have access to all views (not tables) that were flagged for consumption in their properties in Data Builder
  4. Then tell the system, if you also want to write back. If yes, a so-called Open SQL Schema is generated of the name <schema>#<user> (hence in my case PYTHONCODESAMPLES#PYTHON). Note that this is really a different schema. You will be able to consume data from it in your space schema, but you’ll not be able to write directly to the space schema itself (this is for security).

Now let’s look at how this looks in the system:

After hitting Create, the system shows all relevant connection details, i.e. host, port (443), user name & password.

Don’t forget to copy esp. the password, since this is the only place it ever shows. If you forgot to copy it, you’ll need to inspect the database user again and request a new password

Add your IP to DWC’s IP Allowlist

In order for DWC to accept calls from Python or your database client, your IP will need to be allow-listed. Just ask your search engine of choice “what is my IP” to check your public IP. Then open DWC Homepage > Configuration > IP Allowlist and add that IP there. If you forgot to do these steps, DWC will not accept your call, resulting in errors of the type “cannot connect to database…”.

Setting up your Python environment

Let’s now spin up a Jupyter notebook and let’s do all subsequent actions directly in it. For simplicity sake, I copy the respective cell contents below one by one.

First you need to install hdbcli as well as Sqlalchemy. Sqlalchemy comes in two parts: the main engine and an own extension for the hana-specific ‘dialect’. Install all 3 via PyPi/Conda as in

%pip install hdbcli
%conda install -c anaconda sqlalchemy
%pip install sqlalchemy-hana
%conda install -c conda-forge python-dotenv

Obviously your SAP Hana Cloud user and password should not appear in your GitHub repo, so best put it into a separate .env file that you prevent from committing via .gitignore

Copy the password into a new file .env with this content

TEST=some test value
HANA_PWD_PYTHONCODESAMPLES=<your pwd here>

You can hide this file’s existence from git by adding a .gitignore file. This will prevent git from uploading the .env file to the repository. .gitignore should contain a single line that says:

.env

Then you can essentially start loading the libraries, test the environment and save the connection details. Let’s start by importing the respective libraries

import os
from hdbcli import dbapi
from dotenv import load_dotenv
import pandas as pd

from sqlalchemy import create_engine
from sqlalchemy.engine import reflection
import sqlalchemy.types

Next test the environment file. It should yield ‘some test value’ since this is what the .env file contains for key “TEST”

# testing out that reading from .env works as expected
load_dotenv()
os.environ.get("TEST")

And let’s set some constants for use in all subsequent calls. For simplicity, we also specify the name of an example table & view (both should obviously exist in their respective schemas)

# Connection constants for space PYTHONCODESAMPLES
dbHost="xyz.hana.prod-eu10.hanacloud.ondemand.com" #you should have noted them above
dbPort=443
dbUser="PYTHONCODESAMPLES#PYTHON"
dbPwd=os.environ.get("HANA_PWD_PYTHONCODESAMPLES")  #load pwd from .env file
dbIngestionSchema="PYTHONCODESAMPLES#PYTHON"
dbConsumptionSchema="PYTHONCODESAMPLES"
dbExampleTable="NEWTABLE"
dbExampleView="V_Opportunities"

Connectivity via HANA Database Client hdbcli

hdbcli is the standard HANA client for all connections. It is supported by SAP directly and installable via PyPi. Below example is taken directly from SAP’s standard documentation.

#Import your dependencies
import platform
from hdbcli import dbapi

#verify that this is a 64 bit version of Python
print ("Platform architecture: " + platform.architecture()[0])

#Initialize your connection
conn = dbapi.connect(
    address=dbHost,
    port=dbPort,
    user=dbUser,
    password=dbPwd, 
    encrypt=True, # must be set to True when connecting to HANA Cloud
    sslValidateCertificate=False # True HC, False for HANA Express.
)
#If no errors, print connected
print('connected')

Now open a connection cursor and start printing out the first couple of records to the console

cursor = conn.cursor()
sql_command = f'select top 2 * from "{dbConsumptionSchema}"."{dbExampleView}"'
print(sql_command)
cursor.execute(sql_command)
rows = cursor.fetchall()
for row in rows:
    for col in row:
        print ("%s" % col, end=" ")
    print (" ")

In my case, output was this:

If there is a table in your ingestion schema (you might want to create it ahead of time in SAP HANA Database Explorer or any other database client like e.g.. DBeaver), then this code will also work

Note: we create the table further down in the post by writing a Pandas DataFrame to SAP HANA.

cursor = conn.cursor()
sql_command = f'select top 2 * from "{dbIngestionSchema}"."{dbExampleTable}"'
print(sql_command)
cursor.execute(sql_command)
rows = cursor.fetchall()
for row in rows:
    for col in row:
        print ("%s" % col, end=" ")
    print (" ")

In my case, output is this

Connectivity via SQLAlchemy

SQLAlchemy is an object-oriented way of connecting to databases. Its HANA dialect is only informally supported and has minor issues, but altogether I still consider it to be just way more practical than hdbcli as you’ll see below, primarily because of its tight integration into Pandas. Some blog posts can help for SQLAlchemy in general:

Apart from this, there’s really little HANA-specifics that you need to be aware of to get going. So let’s dive in straight away.

# creation of central connection objects. Set echo=True, if you need a log of all statements issued towards the DB
engine = create_engine(f'hana://{dbUser}:{dbPwd}@{dbHost}:{dbPort}', connect_args={
    'sslTrustStore' : "",
    'encrypt' : 'true',
    'sslHostNameInCertificate' : '*'
}, echo=False)
connection = engine.connect()
inspector = reflection.Inspector.from_engine(engine)

#rp = connection.execute(f'select top 10 * from "{dbConsumptionSchema}"."{dbExampleView}"')
rp = connection.execute(f'select top 2 * from "{dbConsumptionSchema}"."{dbExampleView}"')
print(rp.fetchall())

Your output will be a list of sqlalchemy.engine.result.RowProxy objects:

Check what schemas exist

With the next call you can get all database schemas that your database user has access to

engine.dialect.get_schema_names(connection=connection)

In my case this is

['dwc_global',
 'dwc_tenant_owner',
 'pythoncodesamples',
 'PYTHONCODESAMPLES#PYTHON',
 'sys',
 '_sys_bi',
 '_sys_di',
 '_sys_plan_stability',
 '_sys_sql_analyzer',
 '_sys_task']

Beware when using these schema names for subsequent calls: Schemas need to be in mixed-case for usage by SQLAlchemy, while plain SQL requires them in upper case only!

Read views from consumption schema

Consumption schema only exposes views. Use a sligthly different method to read those:

engine.dialect.get_view_names(connection=connection, schema=dbConsumptionSchema)

for my schema yields

['V_Opportunities', 'V_SalesReps']

Now any table is directly at our finger tips and directly loads as Pandas DataFrame

df = pd.read_sql_table(dbExampleView, con=engine, schema=dbConsumptionSchema)

This is really a standard DataFrame like you see here:

type(df), df.columns, df.shape

yields

(pandas.core.frame.DataFrame,
 Index(['Opp_ID', 'Name', 'Account', 'Deal Size', 'Owner', 'Email',
        'Sales Phase', 'Forecast', 'Projected Close Date', 'Close Probability',
        'Opportunity Phase', 'RFP completed', 'Discount Pct'],
       dtype='object'),
 (12, 13))

Writing to the Database

Pandas DataFrames can be directly written to the database via pandas.Dataframe.to_sql. You can choose to replace, append or fail, if the table already exists. The table will be added to the ingestion schema (here: PYTHONCODESAMPLES#PYTHON). If you want to use it in DWC later, you need to import it from that source

Let’s now create a dataframe and write it to SAP HANA. If it exists, you can choose to abort, replace or append. I typically plainly overwrite my current table for simplicity. Note that SAP HANA is fine with this and even if there are views in the consumption schema built on that table, you can still replace that full table and the views will instantly yield the new data

new_df = pd.DataFrame({"id": [1,2,3], "col1":["a","b","c"]})
new_df.to_sql("newtable",con=engine, index=False, if_exists="replace")

# Now the table has been written to SAP HANA as we can check e.g. by reading all table names of the schema
engine.table_names(schema=dbIngestionSchema)

You can also influence the data types used by the database by setting the dtype parameter when calling to_sql.

new_df = pd.DataFrame({"id": [1,2,3], "col1":["a","b","c"], "postcode":[69118, 69125, 69127]})
table_dtypes = {"postcode":sqlalchemy.types.String(15)}
new_df.to_sql(name="newtable",con=engine, dtype=table_dtypes, index=False, if_exists="replace")

While previously the dtype of column postcode was int64, it now will be object, when you read the table again from the database

updatedTable = pd.read_sql_table(tables[0], con=engine, schema=dbIngestionSchema)
updatedTable.dtypes

Reading tables from ingestion schema

Anything written to the ingestion schema is technically a table. Beware about upper/lower case in table names. Let’s now read the table that we wrote previously from a Pandas Dataframe:

tables = engine.table_names(schema=dbIngestionSchema)
table = pd.read_sql_table(tables[0], con=engine, schema=dbIngestionSchema)
table

This will yield

Read table/view metadata

Finally, you can also read metadata for tables & views. Note that types below are sqlalchemy.types

itMeta = inspector.get_columns(table_name='V_Opportunities', schema=dbConsumptionSchema.lower())
itMeta

This will yield a list of dictionaries describing each column as in

I hope this blog post helps in your endeavours to use SAP Data Warehouse Cloud in from your Python environment. If you have any questions, please drop a note and I’m happy to help out.

Cheers, Jan

 

 

Assigned Tags

      2 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Anthony Wynne
      Anthony Wynne

      Thanks for this Jan,

      I do everything database wise with SQLAlchemy; we started moving our data SAP HANA. After a few failed attempts to work out how to connect with SQLAlchemy myself. I was dreading going through my code base and changing all the connection methods.

      The above all worked, on the second attempt; I had to quote out the lines:

      connection = engine.connect()
      inspector = reflection.Inspector.from_engine(engine)

      These kept throwing errors, but it works well without them, many hours saved 🙂

      Anthony

      Author's profile photo Amaury Van Espen
      Amaury Van Espen

      Hello Jan Fetzer

      I would like to understand how to easily manage the SSL certificate or "IP Whitelist" from a third party tool (I mean python notebook) connected to Data Warehouse Cloud space database through it own user.
      I've been successful several times in connecting my notebook to DWC space and dedicated DB, however, according to virtual infrastructure, Im regurlarly facing the same issue,

      Error: (-10709, 'Connection failed (RTE:[89008] Socket closed by peer (xyz.hana.prod-eu10.hanacloud.ondemand.com:443))')

      even after adding a large IP white list from GStatic on 34.*.*.* and 35.*.*.* ranges.

      which is quite "unconfortable"

      Could you provide some insightful tips ?

      (for a full local python env approach, see : this blog post)

      Thank you

      Amaury