Data Science with SAP S/4HANA – How to connect HANA-ML with Jupyter Notebooks (Python)
Hello, I am a dual student at SAP studying International Business Administration and Information Technology.
In this blog, I will happily share with you the data science project I undertook during my practical phase as part of my studies. The practical phase was supervised by the Industry Solution Management for Energy and Utilities Industries – Contact Raik Kulinna.
In today’s fast-paced digital world, businesses are continually striving to optimize their operations through data-driven solutions. One possible solution involves integrating Python Machine Learning Client for SAP HANA (short: hana-ml) with Jupyter Notebooks, offering a live data connection with SAP S/4HANA.
In this blog post, we will explore the setup of the required environment and walk you through the process of connecting Jupyter with SAP’s HANA-ML library. Furthermore, we will demonstrate how to establish a connection between Jupyter and S/4HANA, which allows us to access valuable data.
The following screenshot illustrates the outcome that we aim to achieve in this blog. As depicted, our objective is to establish a live data connection that enables the analysis of data from S/4HANA directly in Python.
So, let’s dive in and unlock the potential of these powerful SAP and OpenSource tools to drive positive change in business practices.
Python and Jupyter set up / installation (without Anacoda)
Before diving into the technical aspects, it is important to set up the landscape correctly. Firstly, we need to set up Python and install Jupyter Notebook.
By doing the following steps, you can start using Python and Jupyter Notebooks for data analysis, machine learning, and other Python-related tasks. In this tutorial, we will not use Anaconda, a popular Python distribution.
- Download the latest Python version (or any other you wish) from “https://www.python.org/downloads/ ”.
- Open the downloaded file and start the installation.
- You can check the installation by running the code print(“Hello World”) in Python’s Integrated Development and Learning Environment (IDLE).
To find the IDLE, search for file ”idle” in your computer.
- Change/set path to environment variables.
First find the location of the “Python 311/Scripts” folder.
- Then edit the path to environment variables.
- Open Command Prompt and install the package installer PIP using the command prompt “pip install virtualenv”.
- Upgrade pip using “python -m pip install -upgrade pip”.
Now that you have installed Python, you can install the popular interactive data science and scientific computing environment Jupyter Notebook.
- Open the Command Prompt and install Jupyter with “python -m pip install jupyter”.
- Wait for it to install until the Command Prompt displays the message “successfully installed”.
You have now installed Jupyter Notebook without Anaconda!
- To open your Jupyter Notebook, write the command “jupyter notebook” in the Command Prompt and press enter. Jupyter Notebook is automatically opened in one of your browsers.
Now that you have successfully set up your environment, we will show you how to connect SAP’s HANA-ML with Jupyter and how to establish a live data connection between Jupyter and SAP S/4HANA.
Fist we will connect/import HANA-ML to Jupyter.
- Install the required SAP library by entering “pip install hana_ml” in the command prompt of your computer.
- Install the required modules by typing the following commands:
import hana_ml from hana_ml import dataframe
Now that we have connected Jupyter with hana-ml, it’s time to establish a connection between Jupyter and SAP S/4HANA.
For that connection you need to download and set up the ODBC driver.
- Follow the instructions on this link: Using ODBC driver for SQL Service | Tutorials for SAP Developers. It is possible that you don’t have the required authorizations to download the software. If that is the case, you will get a waring at the top of the page where you can click on a link to ask for the authorizations that are needed and then you can continue to follow the instructions of the link above.
- Usually, you want to access data that already exists so you can skip step 3 and step 4 of the instructions. If you already have an SAP Core Data Service (CDS) View Entity, a service definition and a service binding, you can skip step 5 and step 6. Not everybody has the permission to create a service definition and a service binding. If you don’t have the permissions, seek the help of a college that does.In my case I didn’t need a communication scenario, so I skipped step 7 to step 9.
- As stated in the instructions of the link above (step 10) you need to create an ODBC data source (System DSN). For that, you need the following parameters:
- To get the hostname and the port number of your SAP S/4HANA system you have to do the following steps:
- Open SAP GUI / SAP Logon and use the transaction SMICM.
- In the bar at the top of the page click on the “Goto” field, then click on “services”.
- To find the service oath you must find the location of the file “sapcrypto.dll”.
You can copy the service path and the additional attributes from the picture bellow:
Now that you have created your Data Source Name (DSN) string you can go back to Jupyter notebook to establish the connection to the SAP S/4HANA system.
- Write the following statements in a cell:
cc = dataframe.ConnectionContext(pyodbc_connection='dsn=test_01; Uid=<yourUserId>;Pwd=<YourPassword>') print("Connected")
If you get the word “Connected” as a result that means that you made a connection to your SAP S/4HANA System without an error!
The next step is to get data from your system using SQL.
- Write the following statements in a cell:
dfr1 = cc.sql('SELECT * FROM <Name_of_service_binding>.<Name_of_CDS_view>' ) dfr1.collect()
If you get a table as a result that means you were successful.
In the steps described in this blog we successfully connected SAP’s HANA ML with Jupyter Notebooks and SAP S/4HANA. By doing that we opened a world of possibilities for data scientists directly on the SAP system.
The integration of HANA ML, which offers advanced machine learning algorithms and capabilities, with Jupyter Notebooks and SAP S/4HANA, provides a robust toolset for data science with SAP. Whether it involves predicting customer behavior, optimizing supply chain operations, or uncovering hidden patterns, this integration enables efficient data analysis and modeling.
In my upcoming blog post, I will provide a concise overview of a data science project focused on the analysis of data generated within an S/4HANA system. The objective of this project is to conduct a comprehensive analysis directly on the data persistency of such system. Stay tuned for my exciting next blog.