Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
sureshkumar_raju
Explorer
In this blog post, You will learn how to Train, Validate, Export, Serve & Inference a simple Machine Learning model using SAP Data Intelligence, Our primary objective here is to experience various features of SAP Data Intelligence product and not to build a best Machine Learning model, therefore lets take a very simple machine learning use case to experience.

The ML use case we took here is basically trying to predict the gender of a person based on some basic features such as age, height(cm) and weight(kg). For simplicity, we have taken a dataset which is only approx 500+ rows.

1. Manage DataSet


In this section you will learn how to upload and manage a dataset that will be used in the training phase later.

Dataset for this tutorial can be downloaded here



Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_mb9ros7k

2. Train Model


In this section, you will learn how to create a Machine Learning Scenario and spin up a Jupyter  Notebook instance to train any machine learning model.



Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_yo8l3imc

Install Required Python Libraries


!pip install scikit-learn==0.22.2
!pip install seaborn==0.10.0

Imports


import os
import re
import csv
import random
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import joblib
from sklearn.model_selection import train_test_split
import pandas as pd
import sapdi
import shutil
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import json
from sapdi import tracking

%matplotlib inline

Load Data from DataLake via Data Manager


ws = sapdi.get_workspace(name='suresh-ws')
dc = ws.get_datacollection(name='gender-collection')
with dc.open('gender.csv').get_reader() as reader:
df = pd.read_csv(reader)

Split Male and Female Data for Visualizing


is_male = df['male(1-male 0-female)']==1
is_female = df['male(1-male 0-female)']==0

male = df[is_male]
female = df[is_female]
female = female.head(-13)

print(male.sample(3))
print(male.shape)
print(female.sample(3))
print(female.shape)

male_weight = male[['weight(kg)']]
male_height = male[['height(cm)']]
print("{} {}".format(male_weight.shape, male_height.shape))

female_weight = female[['weight(kg)']]
female_height = female[['height(cm)']]
print("{} {}".format(female_weight.shape, female_height.shape))

Visualize Male and Female Weights(Kg)


plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_weight, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_weight, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Weight')
plt.show()

Visualize Male & Female Height(Cm)


plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_height, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_height, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Height')
plt.show()

Extract Features and Target


y = df.pop('male(1-male 0-female)')
print(y.sample(5))
X = df
print(X.sample(5))

Visualize Feature Correlation


sns.heatmap(X.corr(), annot=True)
plt.show()

Split Train & Test Data


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Training - SVM


model = SVC()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "SVC"})
tracking.end_run()

Training - DecisionTreeClassifier


model = DecisionTreeClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "DecisionTreeClassifier"})
tracking.end_run()

Training - RandomForestClassifier


model = RandomForestClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "RandomForestClassifier"})
tracking.end_run()

Training - AdaBoostClassifier


model = AdaBoostClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)

metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "AdaBoostClassifier"})
tracking.end_run()

Explore Training & Test Accuracy Captured via Tracking API's


sc = sapdi.get_current_scenario()
run_data = tracking.get_runs(scenario = sc,notebook = sapdi.scenario.Notebook.get(notebook_id="gender.ipynb"))
lst = list()
for r in run_data:
lst_data = list()
lst_data.append(r.tags.get("algo"))
for m in r.metrics:
lst_data.append(m.get("value"))
lst.append(lst_data)

mdf = pd.DataFrame(lst, columns =['algo', 'train_accuracy', 'test_accuracy'])
mdf

Use Metrics Explorer to Visualize Training & Test Accuracy Captured via Tracking API's




Video Linkhttps://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_f16ngmwg

3. Save & Export Model


In this section, You will learn how to save the trained model as pickle file and export as ZIP content which will be used as an artifact to deploy and serve the model in the next phase.



Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_9hhxp4uo

Prepare Required Directory for Save & Export


curr_dir = os.getcwd()
exporter_content = os.path.join(curr_dir, "exporter_content")
exported_content = os.path.join(curr_dir, "exported_content")
zip_file_path = os.path.join(curr_dir, "exported_content/gender_1.zip")
unzip_folder_path = os.path.join(curr_dir, "exported_unzip_content")
print(exporter_content)
print(exported_content)
print(zip_file_path)
print(unzip_folder_path)

Save Model


if os.path.exists(exporter_content) and os.path.isdir(exporter_content):
shutil.rmtree(exporter_content)
os.makedirs(exporter_content)
joblib.dump(model, 'exporter_content/gender.pkl')

Create Required Dependencies for Serving the Model


%%writefile exporter_content/pip_dependencies.txt
scikit-learn==0.22.2
joblib==0.14.1

Create Predictor Class for Serving the Model


%%writefile predictor.py
from sapdi.serving.pymodel.predictor import AbstractPyModelPredictor
import joblib
import json

class GenderPredictor(AbstractPyModelPredictor):
def initialize(self, asset_files_path):
self.classifier = joblib.load(asset_files_path+ '/gender.pkl')

def predict(self, input_dict):
age = input_dict.get("age")
weight = input_dict.get("weight")
height = input_dict.get("height")
real_value = list([[float(age), float(weight), int(height)]])
predicted = self.classifier.predict(real_value)
res = int(predicted[0])
return {'result': {'gender': res}}

Test the Predictor Class


from predictor import GenderPredictor

predictor = GenderPredictor()
predictor.initialize('exporter_content/')
payload = {"age": 37, "weight": 75, "height": 167}
predictor.predict(payload)

Export Model for Serving using SAP DI SDK


if os.path.exists(exported_content) and os.path.isdir(exported_content):
shutil.rmtree(exported_content)
os.makedirs(exported_content)

from sapdi.serving.pymodel.exporter import PyExporter
from predictor import GenderPredictor
exporter = PyExporter()
exporter.save_model(
name = "gender",
model_dir_path = exported_content,
func=GenderPredictor(),
source_path_list=[os.path.join(curr_dir,"predictor.py")],
asset_path_list=[os.path.join(curr_dir, "exporter_content/gender.pkl")],
pip_dependency_file_path=os.path.join(curr_dir, "exporter_content/pip_dependencies.txt"))

if os.path.exists(unzip_folder_path) and os.path.isdir(unzip_folder_path):
shutil.rmtree(unzip_folder_path)
os.makedirs(unzip_folder_path)

shutil.unpack_archive(zip_file_path, extract_dir=unzip_folder_path)

Create Artifact


from sapdi.artifact.artifact import Artifact, ArtifactKind, ArtifactFileType
artifact = sapdi.create_artifact(
file_type=ArtifactFileType.FILE,
artifact_kind=ArtifactKind.MODEL,
description="Gender Model",
artifact_name="gender",
file_name=os.path.basename(zip_file_path),
upload_content=zip_file_path
)
print('Model artifact id {}, file {} registered successfully at {} \n'.format(artifact.artifact_id, zip_file_path,artifact.get_uri()))

 

4. Serve Model


In this section, you will learn how to deploy the exported model artifact into the SAP Data Intelligence platform which will expose REST endpoint for making real time inference request.

Here we are using Model Serving Operator in the graph which is specially designed to serve any complex machine learning models in a scalable way.



Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_s7g3cr1a

  • in the model serving operator configuration, use "mlserving-1.1" as value for the field Model Runtime, leave the rest to default.


 

5. Inference Model


In this section, you will learn how to inference the deployed model through the exposed REST endpoint.



Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_0bf9kwki

Inference Model



  • replace the content "<REST API URL>" with a valid REST endpoint url which you will get from the previous section on successful deployment

  • replace the content "<XXXXX>" with a valid Base64 encoded user credential to inference the deployment. You can use the below snippet for encoding the credentials and as a result you will get base64 encoded credentials printed.


import base64
credential = "dummytenant\\dummyuser:dummypassword"
print(str(base64.b64encode(credential.encode("utf-8")), "utf-8"))

payload = {"age": 37, "weight": 75, "height": 167}

url = "<REST API URL>"

headers = {
'Content-Type': 'application/json',
'X-Requested-With': 'Fetch',
'Authorization': 'Basic <XXXXX>'
}

response = requests.request("POST", url, headers=headers, data = json.dumps(payload))
interpret = response.json().get("result").get("gender")
print("MALE" if interpret==1 else "FEMALE")

 

Conclusion:


Successfully we managed to build an end to end Machine Learning pipeline starting from Data Upload, Data Visualization, Model Training, Model Metrics, Model Export, Model Serve & Model Inference.
6 Comments