Using Predictive Analytics and Python on SAP Cloud Platform HANA database – Part 1
I was recently working with a customer who was interested in doing Predictive Analytics on top of the HANA database which they recently subscribed to on SAP Cloud Platform. They already have an on-premise server for Predictive Suite and have been using their tools against an on-premise HANA database. I this blog, I wanted to share my experience to highlight how easy it is to do the same on a HANA database on the Cloud Platform.
Once you have a HANA database subscribed on your Cloud Platform account, its important to note that you will need an additional subscription to Predictive services on the Cloud Platform too. The Predictive service on Cloud Platform offers REST based APIs which can be used in custom applications that you would build on the Cloud Platform. If you would like to know more about this service, you can go through the blog “Introducing SAP Cloud Platform predictive services“. I have also posted on this topic earlier showing how to use these REST APIs “Capture event streams from IoT devices and perform predictive analytics”
In this scenario, we are not going to leverage the REST APIs as we are going to use the on-premise PA Suite to handle complex use cases.
Once you have subscribed to Predictive Service, you can navigate to your database and click on “Install Components”
The system will give you a self-service option to install the APL Libraries you want. Note, it is recommended to have the version of APL match the version on your on-premise PA suite.
Once the APL libraries are installed on your HANA database, the next thing to do is to setup your Cloud Connector. Follow this tutorial to setup/install your Cloud Connector and setup connectivity to your Cloud Platform account.
The next step would be select “On-premise to Cloud” option within the Cloud Connector to setup a service channel for your database connection. If you would like to know more about how to configure a service channel, here is a blog by Manjunath.
In the example below, I have given the local instance number “98” for my HANA instance “hcpta”
After you save the settings, you can now access the hcpta instance using the host name where the Cloud Connector is installed and Port = 39815. In my example, I have installed the Cloud Connector on my laptop and hence will refer to it as localhost.
Search for ODBC Data Source in your programs and under “System DSN” maintain a new data source connection. In the below screen, I have maintained one with the name “HCP”
In the connection details, I have provided localhost:39815. When I try to test the connection, it will ask me for the HANA DB user credentials and it will give a successful message if everything works fine. I have got Cloud Connector, ODBC and PA software all on the same laptop.
For demonstration purposes, I have created a schema ADM_DEMO which has demo data on banking customers and their transactions. The transaction table has got millions of records which we can use for predicting the customer churn.
I am going to show to get started with Predictive Analytics (on-premise) and connect to HANA DB on SAP Cloud Platform. I am not an expert on PA, but just showing a very basic scenario which created a table back in HANA. I would recommend using Predictive Analytics 3.2 version.
Launch the PA software on your laptop and click on “Create Data Manipulation” under Data Manger.
In the Data sources, select the one created in ODBC and provide the HANA DB login credentials.
Select the Table by browsing through the available schemas.
I have taken the Disposition table to begin with.
I can navigate to the Views tab and explore the data of this table.
Next, I am using the Merge option to connect with the Accounts table
Account_ID is the key which links both the table. I repeat the same steps for adding Client table.
Once I have added all the tables and linked the keys, I can view the SQL which the system has generated.
I can also view the contents of this Dynamic SQL within PA.
I can apply filters, for example to only consider Client Type with a value “Owner”
Once I have put the relevant filters and set the aggregations which are required, I can now save the data back into HANA as a view. In the below example, I have given the name of a table PA_CHURN.
After execution of this step, I can now view the processed table available in HANA database.
In the next blog, I will show you how to connect a python program to perform data science. This has been also coming up as a frequent question especially around customers who are implementing Predictive Analytics.