SAP Data Intelligence to Train, Export, Serve & In...

sureshkumar_raju · ‎04-07-2020

In this blog post, You will learn how to Train, Validate, Export, Serve & Inference a simple Machine Learning model using SAP Data Intelligence, Our primary objective here is to experience various features of SAP Data Intelligence product and not to build a best Machine Learning model, therefore lets take a very simple machine learning use case to experience.

The ML use case we took here is basically trying to predict the gender of a person based on some basic features such as age, height(cm) and weight(kg). For simplicity, we have taken a dataset which is only approx 500+ rows.

1. Manage DataSet

In this section you will learn how to upload and manage a dataset that will be used in the training phase later.

Dataset for this tutorial can be downloaded here

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_mb9ros7k

2. Train Model

In this section, you will learn how to create a Machine Learning Scenario and spin up a Jupyter Notebook instance to train any machine learning model.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_yo8l3imc

Install Required Python Libraries

!pip install scikit-learn==0.22.2

!pip install seaborn==0.10.0

Imports

import os

import re

import csv

import random

from sklearn.svm import SVC

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.ensemble import AdaBoostClassifier

from sklearn.metrics import accuracy_score

from sklearn.metrics import confusion_matrix

import joblib

from sklearn.model_selection import train_test_split

import pandas as pd

import sapdi

import shutil

import matplotlib.pyplot as plt

import seaborn as sns

import requests

import json

from sapdi import tracking



%matplotlib inline

Load Data from DataLake via Data Manager

ws = sapdi.get_workspace(name='suresh-ws')

dc = ws.get_datacollection(name='gender-collection')

with dc.open('gender.csv').get_reader() as reader:

    df = pd.read_csv(reader)

Split Male and Female Data for Visualizing

is_male = df['male(1-male 0-female)']==1

is_female = df['male(1-male 0-female)']==0



male = df[is_male]

female = df[is_female]

female = female.head(-13)



print(male.sample(3))

print(male.shape)

print(female.sample(3))

print(female.shape)

male_weight = male[['weight(kg)']]

male_height = male[['height(cm)']]

print("{} {}".format(male_weight.shape, male_height.shape))



female_weight = female[['weight(kg)']]

female_height = female[['height(cm)']]

print("{} {}".format(female_weight.shape, female_height.shape))

Visualize Male and Female Weights(Kg)

plt.figure(figsize=(18, 6))

x_range = [range(0, 247)]

plt.scatter(x_range, male_weight, color='r', alpha=0.5, s=125)

plt.scatter(x_range, female_weight, color='g', alpha=0.5, s=125)

plt.xlabel('Range')

plt.ylabel('Weight')

plt.show()

Visualize Male & Female Height(Cm)

plt.figure(figsize=(18, 6))

x_range = [range(0, 247)]

plt.scatter(x_range, male_height, color='r', alpha=0.5, s=125)

plt.scatter(x_range, female_height, color='g', alpha=0.5, s=125)

plt.xlabel('Range')

plt.ylabel('Height')

plt.show()

Extract Features and Target

y = df.pop('male(1-male 0-female)')

print(y.sample(5))

X = df

print(X.sample(5))

Visualize Feature Correlation

sns.heatmap(X.corr(), annot=True)

plt.show()

Split Train & Test Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Training - SVM

model = SVC()

model = model.fit(X_train, y_train)

train_accuracy = model.score(X_train, y_train) * 100

test_accuracy = model.score(X_test, y_test) * 100

print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))

print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))



y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)



metrics = {

    "training_accuracy": train_accuracy,

    "test_accuracy": test_accuracy

}

run = tracking.start_run(run_collection_name="gender")

tracking.log_metrics(metrics)

tracking.set_tags({"algo": "SVC"})

tracking.end_run()

Training - DecisionTreeClassifier

model = DecisionTreeClassifier()

model = model.fit(X_train, y_train)

train_accuracy = model.score(X_train, y_train) * 100

test_accuracy = model.score(X_test, y_test) * 100

print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))

print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))



y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)



metrics = {

    "training_accuracy": train_accuracy,

    "test_accuracy": test_accuracy

}

run = tracking.start_run(run_collection_name="gender")

tracking.log_metrics(metrics)

tracking.set_tags({"algo": "DecisionTreeClassifier"})

tracking.end_run()

Training - RandomForestClassifier

model = RandomForestClassifier()

model = model.fit(X_train, y_train)

train_accuracy = model.score(X_train, y_train) * 100

test_accuracy = model.score(X_test, y_test) * 100

print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))

print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))



y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)



metrics = {

    "training_accuracy": train_accuracy,

    "test_accuracy": test_accuracy

}

run = tracking.start_run(run_collection_name="gender")

tracking.log_metrics(metrics)

tracking.set_tags({"algo": "RandomForestClassifier"})

tracking.end_run()

Training - AdaBoostClassifier

model = AdaBoostClassifier()

model = model.fit(X_train, y_train)

train_accuracy = model.score(X_train, y_train) * 100

test_accuracy = model.score(X_test, y_test) * 100

print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))

print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))



y_pred = model.predict(X_test)

cm = confusion_matrix(y_test, y_pred)



metrics = {

    "training_accuracy": train_accuracy,

    "test_accuracy": test_accuracy

}

run = tracking.start_run(run_collection_name="gender")

tracking.log_metrics(metrics)

tracking.set_tags({"algo": "AdaBoostClassifier"})

tracking.end_run()

Explore Training & Test Accuracy Captured via Tracking API's

sc = sapdi.get_current_scenario()

run_data =  tracking.get_runs(scenario = sc,notebook = sapdi.scenario.Notebook.get(notebook_id="gender.ipynb"))

lst = list()

for r in run_data:

    lst_data = list()

    lst_data.append(r.tags.get("algo"))

    for m in r.metrics:

        lst_data.append(m.get("value"))

    lst.append(lst_data)



mdf = pd.DataFrame(lst, columns =['algo', 'train_accuracy', 'test_accuracy']) 

mdf

Use Metrics Explorer to Visualize Training & Test Accuracy Captured via Tracking API's

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_f16ngmwg

3. Save & Export Model

In this section, You will learn how to save the trained model as pickle file and export as ZIP content which will be used as an artifact to deploy and serve the model in the next phase.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_9hhxp4uo

Prepare Required Directory for Save & Export

curr_dir = os.getcwd()

exporter_content = os.path.join(curr_dir, "exporter_content")

exported_content = os.path.join(curr_dir, "exported_content")

zip_file_path = os.path.join(curr_dir, "exported_content/gender_1.zip")

unzip_folder_path = os.path.join(curr_dir, "exported_unzip_content")

print(exporter_content)

print(exported_content)

print(zip_file_path)

print(unzip_folder_path)

Save Model

if os.path.exists(exporter_content) and os.path.isdir(exporter_content):

    shutil.rmtree(exporter_content)

os.makedirs(exporter_content)

joblib.dump(model, 'exporter_content/gender.pkl')

Create Required Dependencies for Serving the Model

%%writefile exporter_content/pip_dependencies.txt

scikit-learn==0.22.2

joblib==0.14.1

Create Predictor Class for Serving the Model

%%writefile predictor.py

from sapdi.serving.pymodel.predictor import AbstractPyModelPredictor

import joblib

import json 



class GenderPredictor(AbstractPyModelPredictor):

    def initialize(self, asset_files_path):

        self.classifier = joblib.load(asset_files_path+ '/gender.pkl')

        

    def predict(self, input_dict):

        age = input_dict.get("age")

        weight = input_dict.get("weight")

        height = input_dict.get("height")

        real_value = list([[float(age), float(weight), int(height)]])

        predicted = self.classifier.predict(real_value)

        res = int(predicted[0])

        return {'result': {'gender': res}}

Test the Predictor Class

from predictor import GenderPredictor



predictor = GenderPredictor()

predictor.initialize('exporter_content/')

payload = {"age": 37, "weight": 75, "height": 167}

predictor.predict(payload)

Export Model for Serving using SAP DI SDK

if os.path.exists(exported_content) and os.path.isdir(exported_content):

    shutil.rmtree(exported_content)

os.makedirs(exported_content)

    

from sapdi.serving.pymodel.exporter import PyExporter

from predictor import GenderPredictor

exporter = PyExporter()

exporter.save_model(

    name = "gender",

    model_dir_path = exported_content,

    func=GenderPredictor(),

    source_path_list=[os.path.join(curr_dir,"predictor.py")],

    asset_path_list=[os.path.join(curr_dir, "exporter_content/gender.pkl")],

    pip_dependency_file_path=os.path.join(curr_dir, "exporter_content/pip_dependencies.txt"))



if os.path.exists(unzip_folder_path) and os.path.isdir(unzip_folder_path):

    shutil.rmtree(unzip_folder_path)

    os.makedirs(unzip_folder_path)

    

shutil.unpack_archive(zip_file_path, extract_dir=unzip_folder_path)

Create Artifact

from sapdi.artifact.artifact import Artifact, ArtifactKind, ArtifactFileType

artifact = sapdi.create_artifact(

    file_type=ArtifactFileType.FILE,

    artifact_kind=ArtifactKind.MODEL,

    description="Gender Model",

    artifact_name="gender",

    file_name=os.path.basename(zip_file_path),

    upload_content=zip_file_path

)

print('Model artifact id {}, file {} registered successfully at {} \n'.format(artifact.artifact_id, zip_file_path,artifact.get_uri()))

4. Serve Model

In this section, you will learn how to deploy the exported model artifact into the SAP Data Intelligence platform which will expose REST endpoint for making real time inference request.

Here we are using Model Serving Operator in the graph which is specially designed to serve any complex machine learning models in a scalable way.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_s7g3cr1a

in the model serving operator configuration, use "mlserving-1.1" as value for the field Model Runtime, leave the rest to default.

5. Inference Model

In this section, you will learn how to inference the deployed model through the exposed REST endpoint.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_0bf9kwki

Inference Model

replace the content "<REST API URL>" with a valid REST endpoint url which you will get from the previous section on successful deployment

replace the content "<XXXXX>" with a valid Base64 encoded user credential to inference the deployment. You can use the below snippet for encoding the credentials and as a result you will get base64 encoded credentials printed.

import base64

credential = "dummytenant\\dummyuser:dummypassword"

print(str(base64.b64encode(credential.encode("utf-8")), "utf-8"))

payload = {"age": 37, "weight": 75, "height": 167}



url = "<REST API URL>"



headers = {

  'Content-Type': 'application/json',

  'X-Requested-With': 'Fetch',

  'Authorization': 'Basic <XXXXX>'

}



response = requests.request("POST", url, headers=headers, data = json.dumps(payload))

interpret = response.json().get("result").get("gender")

print("MALE" if interpret==1 else "FEMALE")

Conclusion:

Successfully we managed to build an end to end Machine Learning pipeline starting from Data Upload, Data Visualization, Model Training, Model Metrics, Model Export, Model Serve & Model Inference.