Skip to Content
Author's profile photo Marc DANIAU

Train a model from a Jupyter notebook using the Python API of SAP Predictive Analytics

The Python API of SAP Predictive Analytics allows you to train and apply models programmatically. The user’s code can be executed either in batch mode, from a py script, or interactively, from a notebook.

In this article, you will see how to configure, train and save a model with the API.

The example presented below was done on a Windows machine with:

  • SAP Predictive Analytics 3.3 Desktop, that includes the Python API.
  • The WinPython distribution, that has data science libraries and the Jupyter Notebook App.

We will work with census data that comes with SAP Predictive Analytics.

 

The Training Dataset

To start, we read the csv file and load its content into a Pandas data frame.

import pandas as pd

data_file = "Census01.csv"
data_folder = r"C:\Program Files\SAP Predictive Analytics\Desktop\Automated\Samples\Census"
df = pd.read_csv(data_folder + "\\" + data_file, header=0)

What is the dataset size?

text= "Size of %s" % (data_file)
print('\x1b[1m'+ text + '\x1b[0m')
num = df.shape[0]
print("{} Rows ".format(num))
num = df.shape[1]
print("{} Columns ".format(num))

We display the first ten rows.

df.head(10)

The last column, class, contains 1 if the individual’s annual income is over 50K, 0 otherwise.

We check the proportion of positive cases.

s1=df['class'].value_counts()
s2=df['class'].value_counts(normalize = True) *100
dfc = pd.concat([s1.rename("Observations"), s2.rename("In %")], axis=1)
dfc['In %'] = dfc['In %'].round(2)
dfc.index.name = 'Class'
dfc

The percentage of class 1 cases is large enough.

We can break down that percentage by a given categorical variable like relationship for example.

pd.crosstab(df['relationship'],df['class'],margins=True, normalize=True).round(4)*100

Class is the outcome we want to predict. To make predictions, we must first learn from our training dataset whose outcome is known. This is where the Automated Analytics library (aalib) comes into play.

What is our version of Python by the way?

print('\x1b[1m'+ 'Python Version' + '\x1b[0m')
import platform
platform.python_version()

We have the required version: 3.5. We can proceed with using aalib.

 

Initialization

We provide the paths to the Python API, the C++ API and the SAP Predictive Analytics desktop directories.

import sys
sys.path.append(r"C:\Program Files\SAP Predictive Analytics\Desktop\Automated\EXE\Clients\Python35")
import os
os.environ['PATH'] = r"C:\Program Files\SAP Predictive Analytics\Desktop\Automated\EXE\Clients\CPP"

AA_DIRECTORY = "C:\Program Files\SAP Predictive Analytics\Desktop\Automated"

We import the Automated Analytics library and we specify the context and the configuration store.

import aalib

class DefaultContext(aalib.IKxenContext):
    def __init__(self): 
        super().__init__()

    def userMessage(self, iSource, iMessage, iLevel):
        print(iMessage)
        return True

    def userConfirm(self, iSource, iPrompt):
        pass

    def userAskOne(iSource, iPrompt, iHidden):
        pass

    def stopCallBack(iSource):
        pass

frontend = aalib.KxFrontEnd([])
factory = frontend.getFactory()
context = DefaultContext()

factory.setConfiguration("DefaultMessages", "true")
config_store = factory.createStore("Kxen.FileStore")
config_store.setContext(context, 'en', 10, False)
config_store.openStore(AA_DIRECTORY + "\EXE\Clients\CPP", "", "")
config_store.loadAdditionnalConfig("KxShell.cfg")

 

Creating the Model

We create a “regression” model that will perform a classification if the specified target is nominal (e.g. class), a regression if it is continuous (e.g. age).

model = factory.createModel("Kxen.SimpleModel")
model.setContext(context, 'en', 8, False)
model.pushTransformInProtocol("Default", "Kxen.RobustRegression")

With aalib one can work against a database table or a flat file, but not against a Pandas data frame. In our case, we declare a training store against the census csv file.

store = model.openNewStore("Kxen.FileStore", data_folder, "", "")
model.newDataSet("Training", data_file, store)

The API can guess the data description or read it from a file, if any.

# model.guessSpaceDescription("Training")
metadata_file = "Desc_Census01.csv"
model.readSpaceDescription("Training", metadata_file, store)

We set the column name of the target. It can be hard-coded or based on a rule like: last column name.

target_col = list(df)[-1]

We set the roles of the variables.

model.getParameter("")
variables = model.getParameter("Protocols/Default/Variables")
variables.setAllValues("Role", "input")
variables.setSubValue(target_col + "/Role", "target")
variables.setSubValue("KxIndex/Role", "skip")
model.validateParameter()

We choose the partitioning scheme: Estimation and Validation. By default, three partitions are prepared: Estimation, Validation and Test.

model.getParameter("")
model.changeParameter("Parameters/CutTrainingPolicy", "random with no test")
model.validateParameter()

We can enable or disable the auto-selection of candidate predictors with a true/false parameter.

model.getParameter("")
model.changeParameter("Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/VariableSelection", "true")
model.validateParameter()

We can set the Polynomial Order, the default value being 1.

model.getParameter("")
model.changeParameter("Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/Order", "1")
model.validateParameter() 

Finally, we train the model.

model.sendMode(aalib.Kxen_learn, store) 

 

Saving the model

Our model was successfully trained. Let’s save it. The method for that is described below.

help(model.saveModel)

We name our model and persist it for later use.

model_folder = r"O:\MODULES_PA/PYTHON_API/MY_MODELS"
model_file = "models_space"
model.setName("My Classification Model")
model_comment = "Generated with Python API from Jupyter Notebook"
model_store = model.openNewStore("Kxen.FileStore", model_folder, "", "")
model.saveModel(model_store, model_file, model_comment)

This model has the same format as if you had saved it using the desktop application. The desktop user can load it if need be.

In a subsequent blog, we will debrief our census model inside a Jupyter notebook.

Assigned Tags

      5 Comments
      You must be Logged on to comment or reply to a post.
      Author's profile photo Chris Gruber
      Chris Gruber

      This is an excellent series of combining Python and the Data Science process.  Great Job!

      Author's profile photo Parinya Hiranpanthaporn
      Parinya Hiranpanthaporn

      Great article Marc, would it possible to deploy the model using Predictive Factory?

      Thanks

      Author's profile photo Marc DANIAU
      Marc DANIAU
      Blog Post Author

      Once saved, the model could be imported in Predictive Factory with the Import capability, provided that it meets the conditions described here..

      https://help.sap.com/viewer/41d1a6d4e7574e32b815f1cc87c00f42/3.3/en-US/6d11e4fc037b40609e86803179c85226.html

      But creating the model directly in Predictive Factory would be much simpler.

      Author's profile photo Antoine CHABERT
      Antoine CHABERT

      Hello Parinya, feel free to use the (internal) SAP JAM "Advanced & Augmented Analytics" to ask such questions in the future. We are monitoring forum questions there as well.

      Author's profile photo ranveer singh
      ranveer singh

      This is a very interesting article it help me a lot to understand how SAP data i can use in the Machine learning model using Python, but how we can filter null data and replace mode in SAP datasaet using python.