Skip to Content
Technical Articles

SAP Data Intelligence to Train, Export, Serve & Inference Machine Learning Models

In this blog post, You will learn how to Train, Validate, Export, Serve & Inference a simple Machine Learning model using SAP Data Intelligence, Our primary objective here is to experience various features of SAP Data Intelligence product and not to build a best Machine Learning model, therefore lets take a very simple machine learning use case to experience.

The ML use case we took here is basically trying to predict the gender of a person based on some basic features such as age, height(cm) and weight(kg). For simplicity, we have taken a dataset which is only approx 500+ rows.

1. Manage DataSet

In this section you will learn how to upload and manage a dataset that will be used in the training phase later.

Dataset for this tutorial can be downloaded here

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_mb9ros7k

2. Train Model

In this section, you will learn how to create a Machine Learning Scenario and spin up a Jupyter  Notebook instance to train any machine learning model.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_yo8l3imc

Install Required Python Libraries

!pip install scikit-learn==0.22.2
!pip install seaborn==0.10.0

Imports

import os
import re
import csv
import random
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import joblib
from sklearn.model_selection import train_test_split
import pandas as pd
import sapdi
import shutil
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import json
from sapdi import tracking

%matplotlib inline

Load Data from DataLake via Data Manager

ws = sapdi.get_workspace(name='suresh-ws')
dc = ws.get_datacollection(name='gender-collection')
with dc.open('gender.csv').get_reader() as reader:
    df = pd.read_csv(reader)

Split Male and Female Data for Visualizing

is_male = df['male(1-male 0-female)']==1
is_female = df['male(1-male 0-female)']==0

male = df[is_male]
female = df[is_female]
female = female.head(-13)

print(male.sample(3))
print(male.shape)
print(female.sample(3))
print(female.shape)
male_weight = male[['weight(kg)']]
male_height = male[['height(cm)']]
print("{} {}".format(male_weight.shape, male_height.shape))

female_weight = female[['weight(kg)']]
female_height = female[['height(cm)']]
print("{} {}".format(female_weight.shape, female_height.shape))

Visualize Male and Female Weights(Kg)

plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_weight, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_weight, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Weight')
plt.show()

Visualize Male & Female Height(Cm)

plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_height, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_height, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Height')
plt.show()

Extract Features and Target

y = df.pop('male(1-male 0-female)')
print(y.sample(5))
X = df
print(X.sample(5))

Visualize Feature Correlation

sns.heatmap(X.corr(), annot=True)
plt.show()

Split Train & Test Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Training – SVM

model = SVC()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
    "training_accuracy": train_accuracy,
    "test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "SVC"})
tracking.end_run()

Training – DecisionTreeClassifier

model = DecisionTreeClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
    "training_accuracy": train_accuracy,
    "test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "DecisionTreeClassifier"})
tracking.end_run()

Training – RandomForestClassifier

model = RandomForestClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
    "training_accuracy": train_accuracy,
    "test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "RandomForestClassifier"})
tracking.end_run()

Training – AdaBoostClassifier

model = AdaBoostClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
    "training_accuracy": train_accuracy,
    "test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "AdaBoostClassifier"})
tracking.end_run()

Explore Training & Test Accuracy Captured via Tracking API’s

sc = sapdi.get_current_scenario()
run_data =  tracking.get_runs(scenario = sc,notebook = sapdi.scenario.Notebook.get(notebook_id="gender.ipynb"))
lst = list()
for r in run_data:
    lst_data = list()
    lst_data.append(r.tags.get("algo"))
    for m in r.metrics:
        lst_data.append(m.get("value"))
    lst.append(lst_data)

mdf = pd.DataFrame(lst, columns =['algo', 'train_accuracy', 'test_accuracy']) 
mdf

Use Metrics Explorer to Visualize Training & Test Accuracy Captured via Tracking API’s

Video Linkhttps://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_f16ngmwg

3. Save & Export Model

In this section, You will learn how to save the trained model as pickle file and export as ZIP content which will be used as an artifact to deploy and serve the model in the next phase.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_9hhxp4uo

Prepare Required Directory for Save & Export

curr_dir = os.getcwd()
exporter_content = os.path.join(curr_dir, "exporter_content")
exported_content = os.path.join(curr_dir, "exported_content")
zip_file_path = os.path.join(curr_dir, "exported_content/gender_1.zip")
unzip_folder_path = os.path.join(curr_dir, "exported_unzip_content")
print(exporter_content)
print(exported_content)
print(zip_file_path)
print(unzip_folder_path)

Save Model

if os.path.exists(exporter_content) and os.path.isdir(exporter_content):
    shutil.rmtree(exporter_content)
os.makedirs(exporter_content)
joblib.dump(model, 'exporter_content/gender.pkl')

Create Required Dependencies for Serving the Model

%%writefile exporter_content/pip_dependencies.txt
scikit-learn==0.22.2
joblib==0.14.1

Create Predictor Class for Serving the Model

%%writefile predictor.py
from sapdi.serving.pymodel.predictor import AbstractPyModelPredictor
import joblib
import json 

class GenderPredictor(AbstractPyModelPredictor):
    def initialize(self, asset_files_path):
        self.classifier = joblib.load(asset_files_path+ '/gender.pkl')
        
    def predict(self, input_dict):
        age = input_dict.get("age")
        weight = input_dict.get("weight")
        height = input_dict.get("height")
        real_value = list([[float(age), float(weight), int(height)]])
        predicted = self.classifier.predict(real_value)
        res = int(predicted[0])
        return {'result': {'gender': res}}

Test the Predictor Class

from predictor import GenderPredictor

predictor = GenderPredictor()
predictor.initialize('exporter_content/')
payload = {"age": 37, "weight": 75, "height": 167}
predictor.predict(payload)

Export Model for Serving using SAP DI SDK

if os.path.exists(exported_content) and os.path.isdir(exported_content):
    shutil.rmtree(exported_content)
os.makedirs(exported_content)
    
from sapdi.serving.pymodel.exporter import PyExporter
from predictor import GenderPredictor
exporter = PyExporter()
exporter.save_model(
    name = "gender",
    model_dir_path = exported_content,
    func=GenderPredictor(),
    source_path_list=[os.path.join(curr_dir,"predictor.py")],
    asset_path_list=[os.path.join(curr_dir, "exporter_content/gender.pkl")],
    pip_dependency_file_path=os.path.join(curr_dir, "exporter_content/pip_dependencies.txt"))

if os.path.exists(unzip_folder_path) and os.path.isdir(unzip_folder_path):
    shutil.rmtree(unzip_folder_path)
    os.makedirs(unzip_folder_path)
    
shutil.unpack_archive(zip_file_path, extract_dir=unzip_folder_path)

Create Artifact

from sapdi.artifact.artifact import Artifact, ArtifactKind, ArtifactFileType
artifact = sapdi.create_artifact(
    file_type=ArtifactFileType.FILE,
    artifact_kind=ArtifactKind.MODEL,
    description="Gender Model",
    artifact_name="gender",
    file_name=os.path.basename(zip_file_path),
    upload_content=zip_file_path
)
print('Model artifact id {}, file {} registered successfully at {} \n'.format(artifact.artifact_id, zip_file_path,artifact.get_uri()))

 

4. Serve Model

In this section, you will learn how to deploy the exported model artifact into the SAP Data Intelligence platform which will expose REST endpoint for making real time inference request.

Here we are using Model Serving Operator in the graph which is specially designed to serve any complex machine learning models in a scalable way.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_s7g3cr1a

  • in the model serving operator configuration, use “mlserving-1.1” as value for the field Model Runtime, leave the rest to default.

 

5. Inference Model

In this section, you will learn how to inference the deployed model through the exposed REST endpoint.

Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_0bf9kwki

Inference Model

  • replace the content “<REST API URL>” with a valid REST endpoint url which you will get from the previous section on successful deployment
  • replace the content “<XXXXX>” with a valid Base64 encoded user credential to inference the deployment. You can use the below snippet for encoding the credentials and as a result you will get base64 encoded credentials printed.
import base64
credential = "dummytenant\\dummyuser:dummypassword"
print(str(base64.b64encode(credential.encode("utf-8")), "utf-8"))
payload = {"age": 37, "weight": 75, "height": 167}

url = "<REST API URL>"

headers = {
  'Content-Type': 'application/json',
  'X-Requested-With': 'Fetch',
  'Authorization': 'Basic <XXXXX>'
}

response = requests.request("POST", url, headers=headers, data = json.dumps(payload))
interpret = response.json().get("result").get("gender")
print("MALE" if interpret==1 else "FEMALE")

 

Conclusion:

Successfully we managed to build an end to end Machine Learning pipeline starting from Data Upload, Data Visualization, Model Training, Model Metrics, Model Export, Model Serve & Model Inference.

6 Comments
You must be Logged on to comment or reply to a post.
  • Great end-to-end description of the whole Data Science workflow in SAP DI.

    I assume, people would even benefit more if the videos had some extra texts, that provide more insight to the underlying process.

  • Hi Suresh,

    Excellent blog!

    I’m using CAL trial version of SAP DI 3.0 . As I’m new to DI and still exploring it, it would be great if you can guide us on below queries

    • I would like to know , once my trial version is expired still will I be able to access my artifact model in a new Instance or account.

     

    • Could you please help me out to know where this current directory resides. ‘/vhome/dsp/scenarios/green/2f66e3ea-ccbb-4815-8682-79add81a7976/notebooks’

     

    • How to export the model artifact zip file to my local machine?