Creating HANA (Cloud Foundry) connection with SAP ...

munishsuri · ‎04-15-2020

Data being one of the most important assets for any Enterprise , its exploration and analysis becomes very crucial.

SAP Data Intelligence is a very powerful tool , which lets you do those complex processing on the data .

What is SAP Data Intelligence and how does it relate to Data Hub? - link

In this blog , you will be able connect HANA database as a service with Data Intelligence , explore the data via meta explorer and apply Random Forest Classifier algorithm on it.

For this you will be requiring a HANA database a service running on SAP Cloud Platform (Foundry) , a running instance of SAP Data Intelligence

if you are new to this platform , i would highly recommend to read blog by Andreas Forster.

So lets Get Started

Open SAP Cloud Platform Cockpit, navigate to the Global Account , then to Sub account , and finally to the space , where your HANA instance is running and open the HANA Dashboard.

Click On Edit and then Allow All IP address , this will make sure your SAP Data Intelligence instance can access the HANA instance

Its time to login into your SAP Data Intelligence and navigate to connection management and create a connection of type HANA_DB

user , password - username and password for logging into the HANA database

Host,Port - direct sql connectivity host and port , which can be found on HANA DB dashboard from above step

Now we are going to create a Jupyter notebook.

For analysis , my database (File) looks like

User ID	Gender	Age	Salary	Purchased
1	Male	19	19000	0
2	Male	25	24000	1
3	Male	36	25000	0
4	Female	37	87000	1
5	Female	29	89000	0
6	Female	27	90000	1

For analysis i will be using (Only Age , Salary Column) for predicting Purchased column

now open a jupyter notebook from ML scenario manager and install these libraries one by one

pip install sklearn

pip install hdbcli

pip install matplot

Code For Jupiter (Note , if you have any library missing , kindly install using above step)

2 things to configue

HANA connection id - line 2

Enter Table Name (Schema.TableName) - line 13

import notebook_hana_connector.notebook_hana_connector

di_connection = notebook_hana_connector.notebook_hana_connector.get_datahub_connection(id_="hana") # enter id of the connection

from hdbcli import dbapi

conn = dbapi.connect(

    address=di_connection["contentData"]['host'],

    port=di_connection["contentData"]['port'],

    user=di_connection["contentData"]['user'],

    password=di_connection["contentData"]["password"],

     encrypt='true',

    sslValidateCertificate='false'

)

cursor = conn.cursor()

path="ML_TEST.PURCHASE" #enter table name

sql = 'SELECT * FROM '+path

cursor = conn.cursor()

cursor.execute(sql)

c=0

X=[]

y=[]

for row in cursor:

    d_r=[]

    #I AM USING 4 COLUMN DATASET

    

    d_r.append(row[2])

    d_r.append(row[3])

    y.append(row[4])

    X.append(d_r)

    

    

# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd



# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)









# Fitting Random Forest Classification to the Training set

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)

classifier.fit(X_train, y_train)



# Predicting the Test set results

y_pred = classifier.predict(X_test)



# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred.tolist())

print(cm)

arrx=np.array(X_train)

y_set=np.array(y_train)



from matplotlib.colors import ListedColormap

X1, X2 = np.meshgrid(np.arange(start = arrx[:, 0].min() - 1, stop = arrx[:, 0].max() + 1, step = 0.1),

                     np.arange(start = arrx[:, 1].min() - 1, stop = arrx[:, 1].max() + 1, step = 1000))

plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

             alpha = 0.75, cmap = ListedColormap(('red', 'green')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y_set)):

    plt.scatter(arrx[y_set == j, 0], arrx[y_set == j, 1],

                c = ListedColormap(('red', 'green'))(i), label = j)

you should be able to view the results in a graph.

(Please refer to blog on how to create a pipeline and deploy it as well)

Now Lets us create a pipeline from ML Scenario Manager for creating the model.

First let us create a pipeline from the template python producer

(There are some changes in the components ) to get data from HANA

Constant Generator - to feed in the SQL query , please see the configuration below, in this case the query is
```
SELECT * FROM ML_TEST.PURCHASE
```

HANA Client (To connect with HANA):things to note(Connection,TableName) and if you scroll down(ColumnHeader) select it to None

JS Operator - to extract only the body of the message i.e. rows

$.setPortCallback("input",onInput);



function isByteArray(data) {

    switch (Object.prototype.toString.call(data)) {

        case "[object Int8Array]":

        case "[object Uint8Array]":

            return true;

        case "[object Array]":

        case "[object GoArray]":

            return data.length > 0 && typeof data[0] === 'number';

    }

    return false;

}



function onInput(ctx,s) {

    var msg = {};



    var inbody = s.Body;

    var inattributes = s.Attributes;



    // convert the body into string if it is bytes

    if (isByteArray(inbody)) {

        inbody = String.fromCharCode.apply(null, inbody);

    }



    msg.Attributes = {};

    msg.Body = inbody;

   



    $.output(msg.Body);

}

To String converter (Use inInterface for sending the data from JS operator to the python file)

Python File for training the model and saving it

# Example Python script to perform training on input data & generate Metrics & Model Blob

def on_input(data):

   

    import pandas as pd

    import io

    from io import BytesIO

    import os

    import numpy as np

    import json



    

    dataset = json.loads(data)

    i =0;

    # to send metrics to the Submit Metrics operator, create a Python dictionary of key-value pairs

    X=[]

    y=[]

    for j in  dataset:

        

        x_temp=[]

        x_temp.append(j["AGE"])

        x_temp.append(j["SALARY"])

        y.append(j["PURCHASED"])

        X.append(x_temp)

    from sklearn.model_selection import train_test_split

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)



  



# Fitting Random Forest Classification to the Training set

    from sklearn.ensemble import RandomForestClassifier

    classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)

    classifier.fit(X_train, y_train)



# Predicting the Test set results

    y_pred = classifier.predict(X_test)



# Making the Confusion Matrix

    from sklearn.metrics import confusion_matrix

    cm = confusion_matrix(y_test, y_pred.tolist())

    metrics_dict = {"confusion matrix": str(cm)}

    

    # send the metrics to the output port - Submit Metrics operator will use this to persist the metrics 

    api.send("metrics", api.Message(metrics_dict))



    # create & send the model blob to the output port - Artifact Producer operator will use this to persist the model and create an artifact ID

    import pickle

   

    model_blob = pickle.dumps(classifier)

    api.send("modelBlob",  model_blob)

    

api.set_port_callback("input", on_input)

wiretaps have been used to check the output , you may skip those blocks

For running the pipeline , you may need the dockerfile , blog

Content of the dockerfile

FROM python:3.6.4-slim-stretch







RUN pip install tornado==5.0.2

RUN python3.6 -m pip install numpy==1.16.4

RUN python3.6 -m pip install pandas==0.24.0

RUN python3.6 -m pip install sklearn





RUN groupadd -g 1972 vflow && useradd -g 1972 -u 1972 -m vflow

USER 1972:1972

WORKDIR /home/vflow

ENV HOME=/home/vflow

Now create tags for the dockerfile (Custom tag blogFile is create ) , tag your python file with this tag as well. Build the dockefile

Now we can run the pipeline and store the artifact (Please provide a name )

Now we have to create another pipeline to make an API , so that it can be consumed.For this case use the template (Python Consumer)

As done in the above step , tag the python and update the script

import json

import io

import numpy as np

import pickle



# Global vars to keep track of model status

model = None

model_ready = False



# Validate input data is JSON

def is_json(data):

  try:

    json_object = json.loads(data)

  except ValueError as e:

    return False

  return True



# When Model Blob reaches the input port

def on_model(model_blob):

    global model

    global model_ready

    

    model = pickle.loads(model_blob)

    model_ready=True

   

    



# Client POST request received

def on_input(msg):

    error_message = ""

    success = False

    try:

        attr = msg.attributes

        request_id = attr['message.request.id']

        

        api.logger.info("POST request received from Client - checking if model is ready")

        if model_ready:

            api.logger.info("Model Ready")

            api.logger.info("Received data from client - validating json input")

            

            user_data = msg.body.decode('utf-8')

            # Received message from client, verify json data is valid

            if is_json(user_data):

                api.logger.info("Received valid json data from client - ready to use")

              



                # obtain your results

                feed = json.loads(user_data)

                data_to_predict = np.array(feed['data'])

                api.logger.info(str(data_to_predict))

                

                # check path

                prediction = model.predict(data_to_predict)

                prediction = (prediction > 0)



                success = True

            else:

                api.logger.info("Invalid JSON received from client - cannot apply model.")

                error_message = "Invalid JSON provided in request: " + user_data

                success = False

        else:

            api.logger.info("Model has not yet reached the input port - try again.")

            error_message = "Model has not yet reached the input port - try again."

            success = False

    except Exception as e:

        api.logger.error(e)

        error_message = "An error occurred: " + str(e)

    

    if success:

        # apply carried out successfully, send a response to the user

        result = json.dumps({'Results': str(prediction)})

    else:

        result = json.dumps({'Error': error_message})

    

    request_id = msg.attributes['message.request.id']

    response = api.Message(attributes={'message.request.id': request_id}, body=result)

    api.send('output', response)



    

api.set_port_callback("model", on_model)

api.set_port_callback("input", on_input)

Now you can deploy the pipeline , once it is done , you will get a url , which you can use for the testing of your model , make sure to append /v1/uploadjson/ to your url.

Deployment of the pipeline can take a while .

Post data you can test the model

headers of the call , Authorization is Basic with username

[{"key":"X-Requested-With","value":"XMLHttpRequest","description":""},{"key":"Authorization","value":"Add your authentication here":""},{"key":"Content-Type","value":"application/json","description":""}]

Body of the request , having Age and Salary

{

	"data":[[47,25000]]

}

!!!!! Congratulations !!!!!

you have successfully created and deployed a model , using HANA DB as a data source.

Some Blogs related to SAP Data Intelligence

https://blogs.sap.com/2020/03/20/sap-data-intelligence-development-news-for-3.0/

https://blogs.sap.com/2020/03/20/sap-data-intelligence-next-evolution-of-sap-data-hub/

https://blogs.sap.com/2019/07/17/sap-data-hub-and-sap-data-intelligence-streamlining-data-driven-int...

Creating HANA (Cloud Foundry) connection with SAP Data intelligence and Applying Random Forest

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win