Technical Articles
SAP Data Intelligence to Train, Export, Serve & Inference Machine Learning Models
In this blog post, You will learn how to Train, Validate, Export, Serve & Inference a simple Machine Learning model using SAP Data Intelligence, Our primary objective here is to experience various features of SAP Data Intelligence product and not to build a best Machine Learning model, therefore lets take a very simple machine learning use case to experience.
The ML use case we took here is basically trying to predict the gender of a person based on some basic features such as age, height(cm) and weight(kg). For simplicity, we have taken a dataset which is only approx 500+ rows.
1. Manage DataSet
In this section you will learn how to upload and manage a dataset that will be used in the training phase later.
Dataset for this tutorial can be downloaded here
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_mb9ros7k
2. Train Model
In this section, you will learn how to create a Machine Learning Scenario and spin up a Jupyter Notebook instance to train any machine learning model.
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_yo8l3imc
Install Required Python Libraries
!pip install scikit-learn==0.22.2
!pip install seaborn==0.10.0
Imports
import os
import re
import csv
import random
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
import joblib
from sklearn.model_selection import train_test_split
import pandas as pd
import sapdi
import shutil
import matplotlib.pyplot as plt
import seaborn as sns
import requests
import json
from sapdi import tracking
%matplotlib inline
Load Data from DataLake via Data Manager
ws = sapdi.get_workspace(name='suresh-ws')
dc = ws.get_datacollection(name='gender-collection')
with dc.open('gender.csv').get_reader() as reader:
df = pd.read_csv(reader)
Split Male and Female Data for Visualizing
is_male = df['male(1-male 0-female)']==1
is_female = df['male(1-male 0-female)']==0
male = df[is_male]
female = df[is_female]
female = female.head(-13)
print(male.sample(3))
print(male.shape)
print(female.sample(3))
print(female.shape)
male_weight = male[['weight(kg)']]
male_height = male[['height(cm)']]
print("{} {}".format(male_weight.shape, male_height.shape))
female_weight = female[['weight(kg)']]
female_height = female[['height(cm)']]
print("{} {}".format(female_weight.shape, female_height.shape))
Visualize Male and Female Weights(Kg)
plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_weight, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_weight, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Weight')
plt.show()
Visualize Male & Female Height(Cm)
plt.figure(figsize=(18, 6))
x_range = [range(0, 247)]
plt.scatter(x_range, male_height, color='r', alpha=0.5, s=125)
plt.scatter(x_range, female_height, color='g', alpha=0.5, s=125)
plt.xlabel('Range')
plt.ylabel('Height')
plt.show()
Extract Features and Target
y = df.pop('male(1-male 0-female)')
print(y.sample(5))
X = df
print(X.sample(5))
Visualize Feature Correlation
sns.heatmap(X.corr(), annot=True)
plt.show()
Split Train & Test Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)
Training – SVM
model = SVC()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "SVC"})
tracking.end_run()
Training – DecisionTreeClassifier
model = DecisionTreeClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "DecisionTreeClassifier"})
tracking.end_run()
Training – RandomForestClassifier
model = RandomForestClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "RandomForestClassifier"})
tracking.end_run()
Training – AdaBoostClassifier
model = AdaBoostClassifier()
model = model.fit(X_train, y_train)
train_accuracy = model.score(X_train, y_train) * 100
test_accuracy = model.score(X_test, y_test) * 100
print('Accuracy of Training Set: {:.2f}'.format(train_accuracy))
print('Accuracy of Test Set: {:.2f}'.format(test_accuracy))
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
metrics = {
"training_accuracy": train_accuracy,
"test_accuracy": test_accuracy
}
run = tracking.start_run(run_collection_name="gender")
tracking.log_metrics(metrics)
tracking.set_tags({"algo": "AdaBoostClassifier"})
tracking.end_run()
Explore Training & Test Accuracy Captured via Tracking API’s
sc = sapdi.get_current_scenario()
run_data = tracking.get_runs(scenario = sc,notebook = sapdi.scenario.Notebook.get(notebook_id="gender.ipynb"))
lst = list()
for r in run_data:
lst_data = list()
lst_data.append(r.tags.get("algo"))
for m in r.metrics:
lst_data.append(m.get("value"))
lst.append(lst_data)
mdf = pd.DataFrame(lst, columns =['algo', 'train_accuracy', 'test_accuracy'])
mdf
Use Metrics Explorer to Visualize Training & Test Accuracy Captured via Tracking API’s
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_f16ngmwg
3. Save & Export Model
In this section, You will learn how to save the trained model as pickle file and export as ZIP content which will be used as an artifact to deploy and serve the model in the next phase.
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_9hhxp4uo
Prepare Required Directory for Save & Export
curr_dir = os.getcwd()
exporter_content = os.path.join(curr_dir, "exporter_content")
exported_content = os.path.join(curr_dir, "exported_content")
zip_file_path = os.path.join(curr_dir, "exported_content/gender_1.zip")
unzip_folder_path = os.path.join(curr_dir, "exported_unzip_content")
print(exporter_content)
print(exported_content)
print(zip_file_path)
print(unzip_folder_path)
Save Model
if os.path.exists(exporter_content) and os.path.isdir(exporter_content):
shutil.rmtree(exporter_content)
os.makedirs(exporter_content)
joblib.dump(model, 'exporter_content/gender.pkl')
Create Required Dependencies for Serving the Model
%%writefile exporter_content/pip_dependencies.txt
scikit-learn==0.22.2
joblib==0.14.1
Create Predictor Class for Serving the Model
%%writefile predictor.py
from sapdi.serving.pymodel.predictor import AbstractPyModelPredictor
import joblib
import json
class GenderPredictor(AbstractPyModelPredictor):
def initialize(self, asset_files_path):
self.classifier = joblib.load(asset_files_path+ '/gender.pkl')
def predict(self, input_dict):
age = input_dict.get("age")
weight = input_dict.get("weight")
height = input_dict.get("height")
real_value = list([[float(age), float(weight), int(height)]])
predicted = self.classifier.predict(real_value)
res = int(predicted[0])
return {'result': {'gender': res}}
Test the Predictor Class
from predictor import GenderPredictor
predictor = GenderPredictor()
predictor.initialize('exporter_content/')
payload = {"age": 37, "weight": 75, "height": 167}
predictor.predict(payload)
Export Model for Serving using SAP DI SDK
if os.path.exists(exported_content) and os.path.isdir(exported_content):
shutil.rmtree(exported_content)
os.makedirs(exported_content)
from sapdi.serving.pymodel.exporter import PyExporter
from predictor import GenderPredictor
exporter = PyExporter()
exporter.save_model(
name = "gender",
model_dir_path = exported_content,
func=GenderPredictor(),
source_path_list=[os.path.join(curr_dir,"predictor.py")],
asset_path_list=[os.path.join(curr_dir, "exporter_content/gender.pkl")],
pip_dependency_file_path=os.path.join(curr_dir, "exporter_content/pip_dependencies.txt"))
if os.path.exists(unzip_folder_path) and os.path.isdir(unzip_folder_path):
shutil.rmtree(unzip_folder_path)
os.makedirs(unzip_folder_path)
shutil.unpack_archive(zip_file_path, extract_dir=unzip_folder_path)
Create Artifact
from sapdi.artifact.artifact import Artifact, ArtifactKind, ArtifactFileType
artifact = sapdi.create_artifact(
file_type=ArtifactFileType.FILE,
artifact_kind=ArtifactKind.MODEL,
description="Gender Model",
artifact_name="gender",
file_name=os.path.basename(zip_file_path),
upload_content=zip_file_path
)
print('Model artifact id {}, file {} registered successfully at {} \n'.format(artifact.artifact_id, zip_file_path,artifact.get_uri()))
4. Serve Model
In this section, you will learn how to deploy the exported model artifact into the SAP Data Intelligence platform which will expose REST endpoint for making real time inference request.
Here we are using Model Serving Operator in the graph which is specially designed to serve any complex machine learning models in a scalable way.
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_s7g3cr1a
- in the model serving operator configuration, use “mlserving-1.1” as value for the field Model Runtime, leave the rest to default.
5. Inference Model
In this section, you will learn how to inference the deployed model through the exposed REST endpoint.
Video Link: https://sapvideoa35699dc5.hana.ondemand.com/?entry_id=0_0bf9kwki
Inference Model
- replace the content “<REST API URL>” with a valid REST endpoint url which you will get from the previous section on successful deployment
- replace the content “<XXXXX>” with a valid Base64 encoded user credential to inference the deployment. You can use the below snippet for encoding the credentials and as a result you will get base64 encoded credentials printed.
import base64
credential = "dummytenant\\dummyuser:dummypassword"
print(str(base64.b64encode(credential.encode("utf-8")), "utf-8"))
payload = {"age": 37, "weight": 75, "height": 167}
url = "<REST API URL>"
headers = {
'Content-Type': 'application/json',
'X-Requested-With': 'Fetch',
'Authorization': 'Basic <XXXXX>'
}
response = requests.request("POST", url, headers=headers, data = json.dumps(payload))
interpret = response.json().get("result").get("gender")
print("MALE" if interpret==1 else "FEMALE")
Conclusion:
Successfully we managed to build an end to end Machine Learning pipeline starting from Data Upload, Data Visualization, Model Training, Model Metrics, Model Export, Model Serve & Model Inference.
Nice Walkthrough and great details. thank you!
Great end-to-end description of the whole Data Science workflow in SAP DI.
I assume, people would even benefit more if the videos had some extra texts, that provide more insight to the underlying process.
Yes I too agree, with text annotations it would have been better, will take into account next time
Is there a trial version for DI available ?
Sorry, Not available as of now
Hi Suresh,
Excellent blog!
I'm using CAL trial version of SAP DI 3.0 . As I'm new to DI and still exploring it, it would be great if you can guide us on below queries