Skip to Content
Technical Articles

Multi-model in hana_ml 2.6 for Python: Setup

Today is Thursday. But this post is tagged with both Graph Thursday and Geospatial Tuesday tags? What’s going on?? Well, exciting things are going on!

As Markus Fath already mentioned in his separate post about “SAP HANA Python Client API for Machine Learning Algorithms” (or simply HANA-ML or hana_ml) today:

With the latest release 2.6 on Oct 16, 2020 it includes enhancements to leverage some more of HANA’s multi-model capabilities: spatial and graph.

In my Happy World Statistics Day with hana_ml 2.6! I presented already one new feature of this release: Dataset Report. But in this series, I am going to focus on a graph and spatial modules added to this release.

You should find a few different exercises here, but primarily I want to show, explain and expand the code used for my demo in the Translytical Data Processing with SAP HANA [DAT108]
session of this year’s virtual SAP TechEd. In that demo (example screenshot is below), I store, manipulate, and visualize both graph and spatial data using SAP HANA and Python thanks to hana_ml.

Assumptions

For these posts, I assume you have some knowledge of Python and how to use it in data processing already. And if you are working with data, but have been avoiding Python so far, then my (career) advice to you is to use the fact that you’re reading this as a wake-up call.

I do not plan to go into an explanation of some very basics, but the links I share should help you to catch up where needed.

The setup: database

Version 2.6 of hana_ml is compatible with SAP HANA Cloud and SAP HANA 2.0 SPS 5.

I have a new trial instance of SAP HANA Cloud created. You can create your own by following this tutorial: https://developers.sap.com/tutorials/hana-trial-advanced-analytics.html

And btw, have you heard today’s announcement from SAP CTO Juergen MUELLER about extending the validity of SAP Cloud Platform trial accounts to 12 months? Enjoy!

Some of you may ask how can I use the SAP HANA Cloud trial if it has a limit of 2 vCPUs and no scriptserver, while PAL/APL ML libraries require it. Well, it is not a requirement for multimodel capabilities, so we are good here even without scriptserver.

I haven’t tried every single piece of my code here with SAP HANA, express edition. I assume it should work with some small modifications. If you tried and succeeded (or failed), then please let me know in the comments below.

The setup: client

Just as in my previous post about Dataset Report in hana_ml 2.6 I am going to use a new container in Docker with Jupyter and hana_ml by executing on my macOS laptop with running Docker Desktop:

docker run -p 8888:8888 --detach -v ~:/home/jovyan/work --name hmlsandbox01 -e GRANT_SUDO=yes -e JUPYTER_ENABLE_LAB=yes jupyter/minimal-notebook start-notebook.sh --LabApp.token=''
docker exec hmlsandbox01 pip install hana_ml==2.6.* shapely

But it should work as well with Jupyter Notebook or Lab running outside of a container, or with simple Python3 REPL, or with IPython, or with Jupyter Notebooks natively in VS Code, or with the Python Interactive in VS Code. If you do not want to install a client by yourself, then you can try as well a free tier of Colaboratory. Please let me know in the comments what way you used and how it worked!

Here is my client: JupyterLab running at http://localhost:8888.

If you are not familiar with Jupyter but have heard that this is the environment where most data scientists and data engineers spend a lot of their time nowadays, then it is time to catch up. Here are introduction tutorials to go through https://jupyter.readthedocs.io/en/latest/content-quickstart.html.

The setup: database users

The only user we have in a fresh SAP HANA Cloud deployment is DBAdmin. But it is not a good practice to use it for everything else. We are going to create another user HANAML and for that, we are going to use Python and hana_ml already!

Let me create a new Jupyter notebook called 00 Setup via hana_ml.ipynb.

import hana_ml
print(hana_ml.__version__)

That’s pretty straightforward: import hana_ml and check (visually 😉 ) that it is at least version 2.6.

Copy and paste your SAP HANA Cloud endpoint into this Python statement to assign it to a variable hana_cloud_endpoint.

hana_cloud_endpoint="<<uuid>>.hana.trial-eu10.hanacloud.ondemand.com:443"

Let’s split a host and a port parts of the endpoint, and create a hana_ml‘s connection context cchc with login as a DBAdmin user.

hana_cloud_host, hana_cloud_port=hana_cloud_endpoint.split(":")

cchc=hana_ml.dataframe.ConnectionContext(port=hana_cloud_port,
                                         address=hana_cloud_host,
                                         user='DBAdmin',
                                         encrypt=True
                                        )

Please note that encrypt is True by default in version 2.6, but I included it to remind you about that. Another new thing in this version is that it will interactively ask you for a password if you are using user-based login and a password is not provided in the ConnectionContext method.

Let’s check the version of the database (4.x for the current version of SAP HANA Cloud) and the current schema of the connection (should be default user schema, i.e. DBADMIN for now).

print(cchc.hana_version())
print(cchc.get_current_schema())

And now let’s create a new HANAML user using the cchc connection context. Obviously, replace a string assigned to newpassword with your own (and, no worries, I did it 😉 )

newuser="HANAML"
newpassword="Super$ecr3t!"

cursor = cchc.connection.cursor()

#cursor.execute("DROP USER {} CASCADE".format(newuser))
cursor.execute('CREATE USER {name} PASSWORD "{pwd}" NO FORCE_FIRST_PASSWORD_CHANGE SET USERGROUP DEFAULT'.format(name=newuser, pwd=newpassword))
cursor.execute("ALTER USER {} DISABLE PASSWORD LIFETIME".format(newuser))

cursor.close()

If you want to play with several settings while creating the same user, then uncomment DROP USER line.

Here is how it looks like for me after going through these steps.

Executing SQL statements with hana_ml

As you could see to execute SQL statement that is not SELECT we had to create a cursor using a .cursor() method of a connection property of our connection context.

Is the cursor sound familiar? It should you have used a regular hdbcli Python client for SAP HANA!

Personally, I found it common checking Python variables classes using __class__ and for the connection, it is indeed hdbcli.dbapi.Connection.

cchc.connection.__class__

…SELECT’ing data with SQL

But for SELECT SQL statements we do not need to go to the underlying dbapi connection, and create a cursor, and loop through the cursor reading records.

Here the hana_ml connection context has a convenient .sql() method.

print(cchc
      .sql("""SELECT USER_NAME, CREATOR, PASSWORD_CHANGE_NEEDED, IS_PASSWORD_LIFETIME_CHECK_ENABLED 
              FROM USERS 
              WHERE USER_NAME='{}'
              """.format(newuser))
      .collect())

Please note the .collect() method chained after .sql() to receive results of SELECT statement from SAP HANA. We will look closer at that method and how to include geospatial data there in next parts.

…SELECT’ing data with Python API

But hana_ml provides as well different methods to “pythonify” queries to SAP HANA. The example above can be rewritten by chaining table, select‘ed columns and then a filter on the records:

print(cchc.table("USERS", schema="SYS")
      .select('USER_NAME', 'CREATOR', 'PASSWORD_CHANGE_NEEDED', 'IS_PASSWORD_LIFETIME_CHECK_ENABLED')
      .filter("USER_NAME='{}'".format(newuser))
      .collect())

But behind the scene, there is still an SQL statement constructed and executed in SAP HANA. It can be seen using .select_statement property:

print(cchc.table("USERS", schema="SYS")
      .select('USER_NAME', 'CREATOR', 'PASSWORD_CHANGE_NEEDED', 'IS_PASSWORD_LIFETIME_CHECK_ENABLED')
      .filter("USER_NAME='{}'".format(newuser))
      .select_statement)

A self-check riddle

Before we finish today’s episode, let me post a piece of code for you to self-check why and how it works 🙂

print(cchc.sql(cchc
               .table("USERS", schema="SYS")
               .select('USER_NAME', 'CREATOR', 'PASSWORD_CHANGE_NEEDED', 'IS_PASSWORD_LIFETIME_CHECK_ENABLED')
               .filter("USER_NAME='{}'".format(newuser))
               .select_statement)
      .collect())

You can find this notebook…

…in hana-ml-samples repository: https://github.com/SAP-samples/hana-ml-samples/blob/main/Python-API/usecase-examples/multimodel-analysis-airroutes/00%20Setup%20via%20hana_ml.ipynb.


And with that, we should be all set for the next episode.

And stay healthy ❤️
-Vitaliy (aka @Sygyzmundovych)

Be the first to leave a comment
You must be Logged on to comment or reply to a post.