Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
marc_daniau
Advisor
Advisor
This post concludes a series on the SAP Predictive Analytics Python API used inside a Jupyter notebook.

There are different cases where you need to apply a predictive model:

  • As a participant in a predictive modeling competition like Kaggle, you make predictions on a provided test dataset and submit your output file.

  • To assess the generalization error of a newly trained model, you apply it to a hold-out dataset and compare the predicted values to the actual values.

  • When a model is considered ready for production, you apply it to new data on a regular basis to predict the target outcome.


In this article, you will see how to apply a model using the Python API of SAP Predictive Analytics from a Jupyter notebook.

We have loaded a classification model aimed at detecting fraudulent car insurance claims. We want to apply it on new claims to help detect fraud.

 

Applying the Model with Preset Settings

We specify where the input dataset is and where to write the output dataset.
data_folder = r"O:\MODULES_PA/PYTHON_API/MY_PREDICTIONS"
input_file = "AUTO_CLAIMS_NEW.csv"
output_file = "CLAIMS_PREDICTIONS.csv"

We open a first store for the input data and a second store for the output data.
input_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")
model.newDataSet("ApplyIn", input_file, input_store)
output_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")
model.newDataSet("ApplyOut", output_file, output_store)

Automated Analytics provides, for classification models, preset settings like: Decision, Individual Contributions, Quantiles. We choose: Decision.
t = model.getTransformInProtocol("Default", 0)
t.getParameter("")
t.changeParameter("Parameters/ExtraMode", "Decision")
# t.changeParameter("Parameters/ExtraMode", "Individual Contributions")
# t.changeParameter("Parameters/ExtraMode", "Quantiles")

t.validateParameter()

We apply the model.
model.sendMode(aalib.Kxen_apply, 0)

Let’s check the content of the prediction file.

We load the file in a Pandas data frame.
import pandas as pd
df = pd.read_csv(data_folder + "\\" + output_file, header=0)

We display the first seven rows.
df.head(7)



Because we declared claim id as a key when we configured the model, the apply operation automatically puts it in the prediction file.

If none of the preset settings correspond to your needs, you can use the Advanced mode.

 

Applying the Model with Advanced Settings

We open the input and output stores and activate the Advanced Apply Settings mode.
data_folder = r"O:\MODULES_PA/PYTHON_API/MY_PREDICTIONS"
input_file = "AUTO_CLAIMS_NEW.csv"
output_file = "CLAIMS_PREDICTIONS_ADV.csv"

input_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")
model.newDataSet("ApplyIn", input_file, input_store)
output_store = model.openNewStore("Kxen.FileStore", data_folder, "", "")
model.newDataSet("ApplyOut", output_file, output_store)

t = model.getTransformInProtocol("Default", 0)
t.getParameter("")
t.changeParameter("Parameters/ExtraMode", "Advanced Apply Settings")
t.validateParameter()

We request the decision and its probability.
target_col = "is_fraud"
d_path = "Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/%s" % target_col
model.getParameter("")
settings = model.getParameter(d_path)
# Decision
flag = settings.getSubParameter("PredictedRankCategories")
flag.removeAll()
flag.insert("1")
# Probability of the Decision
flag = settings.getSubParameter("PredictedRankProbabilities")
flag.removeAll()
flag.insert("1")

We want to get also the reason codes.
rc_num = "3"
rc_stat = "Mean"
rc_param = settings.getSubParameter("ReasonCodes")
# Below
lSmartOutputParam = rc_param.insert("0")
lSmartOutputParam.setSubValue("ReasonCount", rc_num)
lSmartOutputParam.setSubValue("BaseLineMethod", rc_stat)
lSmartOutputParam.setSubValue("Direction", "Below")
# Above
lSmartOutputParam = rc_param.insert("1")
lSmartOutputParam.setSubValue("ReasonCount", rc_num)
lSmartOutputParam.setSubValue("BaseLineMethod", rc_stat)
lSmartOutputParam.setSubValue("Direction", "Above")

model.validateParameter()

We apply the model.
model.sendMode(aalib.Kxen_apply, 0)

We display the first ten rows of the output file.
df = pd.read_csv(data_folder + "\\" + output_file, header=0)
df.head(10)



 
2 Comments