Make predictions using the Python API of SAP Predi...

marc_daniau · ‎02-22-2018

This post concludes a series on the SAP Predictive Analytics Python API used inside a Jupyter notebook.

There are different cases where you need to apply a predictive model:

As a participant in a predictive modeling competition like Kaggle, you make predictions on a provided test dataset and submit your output file.

To assess the generalization error of a newly trained model, you apply it to a hold-out dataset and compare the predicted values to the actual values.

When a model is considered ready for production, you apply it to new data on a regular basis to predict the target outcome.

In this article, you will see how to apply a model using the Python API of SAP Predictive Analytics from a Jupyter notebook.

We have loaded a classification model aimed at detecting fraudulent car insurance claims. We want to apply it on new claims to help detect fraud.

Applying the Model with Preset Settings

We specify where the input dataset is and where to write the output dataset.

data_folder = r"O:\MODULES_PA/PYTHON_API/MY_PREDICTIONS"

input_file = "AUTO_CLAIMS_NEW.csv"

output_file = "CLAIMS_PREDICTIONS.csv"

We open a first store for the input data and a second store for the output data.

input_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")

model.newDataSet("ApplyIn", input_file, input_store)

output_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")

model.newDataSet("ApplyOut", output_file, output_store)

Automated Analytics provides, for classification models, preset settings like: Decision, Individual Contributions, Quantiles. We choose: Decision.

t = model.getTransformInProtocol("Default", 0)

t.getParameter("")

t.changeParameter("Parameters/ExtraMode", "Decision")

# t.changeParameter("Parameters/ExtraMode", "Individual Contributions")

# t.changeParameter("Parameters/ExtraMode", "Quantiles")



t.validateParameter()

We apply the model.

model.sendMode(aalib.Kxen_apply, 0)

Let’s check the content of the prediction file.

We load the file in a Pandas data frame.

import pandas as pd

df = pd.read_csv(data_folder + "\\" + output_file, header=0)

We display the first seven rows.

df.head(7)

Because we declared claim id as a key when we configured the model, the apply operation automatically puts it in the prediction file.

If none of the preset settings correspond to your needs, you can use the Advanced mode.

Applying the Model with Advanced Settings

We open the input and output stores and activate the Advanced Apply Settings mode.

data_folder = r"O:\MODULES_PA/PYTHON_API/MY_PREDICTIONS"

input_file = "AUTO_CLAIMS_NEW.csv"

output_file = "CLAIMS_PREDICTIONS_ADV.csv"



input_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")

model.newDataSet("ApplyIn", input_file, input_store)

output_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")

model.newDataSet("ApplyOut", output_file, output_store)



t = model.getTransformInProtocol("Default", 0)

t.getParameter("")

t.changeParameter("Parameters/ExtraMode", "Advanced Apply Settings")

t.validateParameter()

We request the decision and its probability.

target_col = "is_fraud"

d_path = "Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/%s" % target_col

model.getParameter("")

settings = model.getParameter(d_path)

# Decision

flag = settings.getSubParameter("PredictedRankCategories")

flag.removeAll()

flag.insert("1")

# Probability of the Decision

flag = settings.getSubParameter("PredictedRankProbabilities")

flag.removeAll()

flag.insert("1")

We want to get also the reason codes.

rc_num = "3"

rc_stat = "Mean"

rc_param = settings.getSubParameter("ReasonCodes")

# Below 

lSmartOutputParam = rc_param.insert("0")

lSmartOutputParam.setSubValue("ReasonCount", rc_num)

lSmartOutputParam.setSubValue("BaseLineMethod", rc_stat)

lSmartOutputParam.setSubValue("Direction", "Below")

# Above 

lSmartOutputParam = rc_param.insert("1")

lSmartOutputParam.setSubValue("ReasonCount", rc_num)

lSmartOutputParam.setSubValue("BaseLineMethod", rc_stat)

lSmartOutputParam.setSubValue("Direction", "Above")



model.validateParameter()

We apply the model.

model.sendMode(aalib.Kxen_apply, 0)

We display the first ten rows of the output file.

df = pd.read_csv(data_folder + "\\" + output_file, header=0)

df.head(10)

Make predictions using the Python API of SAP Predictive Analytics from a Jupyter notebook

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win