Skip to Content
Technical Articles

Build your first SAP Data Intelligence ML Scenario with TensorFlow

Inspired by Andreas Forster’s excellent blog SAP Data Intelligence: Create your first ML Scenario and encouraged by Karim Mohraz’s incredibly helpful blog Train and Deploy a Tensorflow Pipeline in SAP Data Intelligence, in this blog I will combine both their approaches to demonstrate how to create an as plain vanilla as possible SAP Data Intelligence machine learning scenario with TensorFlow.

To start with, I create a Data Workspace and respective Data Collection in the SAP Data Intelligence ML Data Manager and upload Andreas’s training data there:

From my Jupiter Lab Data Manager, I get to my Data Collection and copy the code to load my training data:

import pandas as pd
import sapdi
ws = sapdi.get_workspace(name='architectSAP')
dc = ws.get_datacollection(name='architectSAP')
with dc.open('RunningTimes.csv').get_reader() as reader:
    df = pd.read_csv(reader, sep=';')
df.head()
	ID	HALFMARATHON_MINUTES	MARATHON_MINUTES
0	1	73	149
1	2	74	154
2	3	78	158
3	4	73	165
4	5	74	172

On that basis, I build my data set:

x = df[['HALFMARATHON_MINUTES']]
y_true = df[['MARATHON_MINUTES']]
import tensorflow as tf
dataset = tf.data.Dataset.from_tensor_slices((x.values, y_true.values))
dataset = dataset.batch(1)
for feat, targ in dataset.take(5):
    print('Features: {}, Target: {}'.format(feat, targ))
print(x.shape)
print(y_true.shape)
Features: [[73]], Target: [[149]]
Features: [[74]], Target: [[154]]
Features: [[78]], Target: [[158]]
Features: [[73]], Target: [[165]]
Features: [[74]], Target: [[172]]
(117, 1)
(117, 1)

To create, compile and train my model:

model = tf.keras.Sequential([tf.keras.layers.Dense(4, name='hidden', batch_size=1, input_shape=(1,)), tf.keras.layers.Dense(1, name='output')])
model.compile(tf.keras.optimizers.Adam(), tf.keras.losses.MeanSquaredError())
model.summary()
history = model.fit(dataset, epochs=8)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
hidden (Dense)               (1, 4)                    8         
_________________________________________________________________
output (Dense)               (1, 1)                    5         
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________
Epoch 1/8
117/117 [==============================] - 0s 2ms/step - loss: 80740.5312
Epoch 2/8
117/117 [==============================] - 0s 3ms/step - loss: 57319.2930
Epoch 3/8
117/117 [==============================] - 0s 3ms/step - loss: 36689.4297
Epoch 4/8
117/117 [==============================] - 0s 3ms/step - loss: 18188.9629
Epoch 5/8
117/117 [==============================] - 0s 3ms/step - loss: 6463.0679
Epoch 6/8
117/117 [==============================] - 0s 3ms/step - loss: 1668.1143
Epoch 7/8
117/117 [==============================] - 0s 3ms/step - loss: 459.1681
Epoch 8/8
117/117 [==============================] - 0s 4ms/step - loss: 292.9158

With very similar results to Andreas’s of course (red) versus the MSE optimum (green), but this time leveraging TensorFlow Keras:

import matplotlib.pyplot as plot
import numpy as np
m, b = np.polyfit(np.squeeze(x), y_true, 1)
plot.scatter(x, y_true);
plot.plot(x, model.predict(x), color = 'red');
plot.plot(x, m*x + b, color = 'green');
plot.xlabel("Actual Minutes Half-Marathon");
plot.ylabel("Actual Minutes Marathon");

And check that there is no auto correlation, by scatter plotting the residuals to verify their randomness:

plot.scatter(x, y_true - model.predict(x), color="orange");
plot.xlabel("Actual Minutes Half-Marathon");
plot.ylabel("Residuals");

For leveraging these results, in the SAP Data Intelligence ML Scenario Manager, I add a Python Producer to create, compile, train and store my model. Since I want to stay as plain vanilla as possible, I only add a few lines to the template and stick with its naming conventions:

import pandas as pd
import tensorflow as tf
import io
import json
import h5py

# Example Python script to perform training on input data & generate Metrics & Model Blob
def on_input(data):
    # to send metrics to the Submit Metrics operator, create a Python dictionary of key-value pairs
    df = pd.read_csv(io.StringIO(data), sep=';')
    x = df[['HALFMARATHON_MINUTES']]
    y_true = df[['MARATHON_MINUTES']]
    dataset = tf.data.Dataset.from_tensor_slices((x.values, y_true.values))
    dataset = dataset.batch(1)
    model = tf.keras.Sequential([tf.keras.layers.Dense(4, batch_size=1, input_shape=(1,)), tf.keras.layers.Dense(1)])
    model.compile(tf.keras.optimizers.Adam(), tf.keras.losses.MeanSquaredError())
    history = model.fit(dataset, epochs=8)
    # metrics_dict = {"kpi1": "1"}
    metrics_dict = json.dumps({'loss': str(history.history['loss'][len(history.history['loss']) - 1])})

    # send the metrics to the output port - Submit Metrics operator will use this to persist the metrics
    api.send("metrics", api.Message(metrics_dict))

    # create & send the model blob to the output port - Artifact Producer operator will use this to persist the model and create an artifact ID
    f = h5py.File('blob', driver='core', backing_store=False)
    model.save(f)
    f.flush()
    # model_blob = bytes("example", 'utf-8')
    model_blob = f.id.get_file_image()
    api.send("modelBlob", model_blob)
    
api.set_port_callback("input", on_input)

Since I use the TensorFlow Python libraries, I need to add a Group with a tag to identify my Docker image:

FROM $com.sap.sles.ml.python
RUN python3.6 -m pip --no-cache-dir install --user --upgrade pip
RUN python3.6 -m pip --no-cache-dir install --user tensorflow

I then execute my Python Producer with these parameters after changing the Connection in the Read File Operator:

To obtain my Metrics, Models and Datasets:

To consume these, in the SAP Data Intelligence ML Scenario Manager, I add a Python Consumer. Since I still want to stay as plain vanilla as possible, I only add a few lines to the template to apply my model and obtain my results:

# apply your model
blob = io.BytesIO(model)
f = h5py.File(blob, 'r')
architectSAP = tf.keras.models.load_model(f)
blob.close() 
# obtain your results
prediction = architectSAP.predict([[json.loads(user_data)['half_marathon_minutes']]])
success = True

As well as pass the successful response to the user:

# apply carried out successfully, send a response to the user
# msg.body = json.dumps({'Results': 'Model applied to input data successfully.'})
msg.body = json.dumps({'marathon_minutes_prediction': str(prediction[0])})

I then deploy my Python Consumer after adding a Group for my Docker image again and passing in my model:

Once my Deployment is successful:

I retrieve my prediction e.g. with Postman:

I tried to keep this blog as plain vanilla as possible, to help you understand the underlying basic concepts. However, there is of course nothing wrong with a bit more sophistication like e.g. building a custom TensorFlow Operator to make your Graphs more efficient and easier to read:

3 Comments
You must be Logged on to comment or reply to a post.