Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
AndreasForster
Product and Topic Expert
Product and Topic Expert
Do you have business processes that require predictions at high speed? Possibly without using SAP HANA at prediction time? For example, you might have IoT data flying in on Kafka or MQTT, to which you must respond with immediate and tailored predictions in real-time. A new feature of the Automated Predictive Library (APL) is making this possible now.

The APL is a highly automated framework to train and score Machine Learning models in SAP HANA. It's scaling the use of Machine Learning for operational processes. No data extraction or duplication required, the architecture is kept lean and the data remains securely in place.

Over time APL has evolved of course. Earlier versions of the APL were able to provide a scoring equation to obtain predictions in different programming languages such as Java, C++ or SQL. Newer versions of the APL included various improvements (i.e. use of Gradient Boosting, SHAP values for global and local explainability, multi-class classification....), but models trained with the latest framework could not be exported to such programming languages.

With the latest release of APL this has changed. Models trained with the new Gradient Boosting APL framework can now be scored in pure JavaScript. The model can now be scored wherever JavaScript can be executed, opening up many new deployment possibilities!

This blog was written by marc.daniau and andreas.forster with kudos and thanks to naiminh.quach, who has kindly provided an extremely helpful function to simplify this workflow.

Update 26 January 2022: The ability to create the scoring equation is now built into the hana_ml library, the export_apply_code() function is not required anymore. You find an example in the documentation. It's now just a single line.
json_export = model.export_apply_code('JSON')

 



Table of contents



Prerequisites


You must go through and implement the steps in the blog Hands-On Tutorial: Automated Predictive (APL) in SAP HANA Cloud. By the end of it, you have used Python in Jupyter Notebooks to load data to SAP HANA Cloud and you are able to train an APL classification model in SAP HANA. This model predicts whether a person is interested in purchasing a specific investment product from their bank.

To summarise, with the help of the "Python Client API for machine learning algorithm" (often called the "HANA ML wrapper) the model was trained as follows.

The hana_ml library has been installed. At the time of writing, the latest version is 2.6.20110600.
import hana_ml
print(hana_ml.__version__)

 

You can connect to your SAP HANA system. This blog is using SAP HANA Cloud, but the whole scenario also works with on-premise SAP HANA.
import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(userkey = 'MYHANACLOUD',
encrypt = 'true')
# Send basic SELECT statement and display the result
sql = 'SELECT 12345 FROM DUMMY'
df_remote = conn.sql(sql)
print(df_remote.collect())

 

Training data is stored in the table BANKMARKETING.
df_remote = conn.table(table = 'BANKMARKETING', schema = 'ML').sort('CUSTOMER_ID', desc = False)
df_remote.head(5).collect()

 

Create and configure the GradientBoostingBinaryClassifier object. For details on the configuration please see the previous blog.
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
gbapl_model = GradientBoostingBinaryClassifier()
col_target = 'PURCHASE'
target_value = 'yes'
col_id = 'CUSTOMER_ID'
col_predictors = df_remote.columns
col_predictors.remove(col_target)
col_predictors.remove(col_id)
gbapl_model.set_params(eval_metric = 'AUC') # Metric used to evaluate the model performance
gbapl_model.set_params(cutting_strategy = 'random with no test') # Internal splitting strategy
gbapl_model.set_params(other_train_apl_aliases={'APL/VariableAutoSelection': 'true',
'APL/Interactions': 'true',
'APL/InteractionsMaxKept': 10,
'APL/TargetKey': target_value})

 

And the model has been trained successfully.
gbapl_model.fit(data = df_remote, 
key = col_id,
features = col_predictors,
label = col_target)

 

Trained APL model to JSON


The APL model has been trained. Now export the trained model's logic into a JSON equation. This equation will be used in the next step to produce new predictions.

The hana_ml wrapper that was used to train the model does not provide a function to quickly obtain the JSON logic. It has to be obtained through SQL syntax. To simplify this step, Nai Minh Quach and marc.daniau created the following function, which takes care everything that's needed.
def export_apply_code(model, **other_params):
conn = model.conn_context.connection
cursor = conn.cursor()

# -- Header
try:
cursor.execute('drop table #FUNC_HEADER')
except:
pass
cursor.execute('create local temporary table #FUNC_HEADER like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.FUNCTION_HEADER"')

# -- Export Parameters
try:
cursor.execute('drop table #EXPORT_CONFIG')
except:
pass
cursor.execute('create local temporary table #EXPORT_CONFIG like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_CONFIG_EXTENDED"')
cursor.execute('insert into #EXPORT_CONFIG values (?, ?, NULL)', ['APL/CodeType', 'JSON'])
cursor.execute('insert into #EXPORT_CONFIG values (?, ?, NULL)', ['APL/ApplyExtraMode', 'Advanced Apply Settings'])

# -- Output table
try:
cursor.execute('drop table #APPLY_CODE_OUTPUT')
except:
pass
cursor.execute('create local temporary table #APPLY_CODE_OUTPUT like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.RESULT"')

# Call APL SQL function
sql = """
DO (
IN header "SAP_PA_APL"."sap.pa.apl.base::BASE.T.FUNCTION_HEADER" => #FUNC_HEADER,
IN config "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_CONFIG_EXTENDED" => #EXPORT_CONFIG,
IN model "SAP_PA_APL"."sap.pa.apl.base::BASE.T.MODEL_BIN_OID" => {model_table} )
BEGIN
"SAP_PA_APL"."sap.pa.apl.base::EXPORT_APPLY_CODE"(:header, :model, :config, out_code);
EXEC 'insert into #APPLY_CODE_OUTPUT select * from :out_code' USING out_code;
END;
"""
model_table_name = model.model_table_.name # the temp table where the model is saved
sql = sql.format(model_table=model_table_name)
cursor.execute(sql)

# Get the code generated
cursor.execute('select to_char(VALUE) value from #APPLY_CODE_OUTPUT')
apply_code = cursor.fetchone()[0]
return apply_code

 

Just pass the trained model into the function.
model_equation = export_apply_code(model=gbapl_model)

 

And you can write the JSON logic to file.
text_file = open("./bank_marketing_model.json", "w")
text_file.write(model_equation)
text_file.close()

 

The file is not meant to be human-readable, but of course you can have a look.


 

By the way, this blog is all about using Python to leverage the Machine Learning in SAP HANA. It is also possible though to obtain the above JSON representation of the model through SQL. See EXPORT_APPLY_CODE in the documentation.

 

JavaScript scoring


The above JSON logic is designed to be used by a JavaScript scoring runtime. Follow these two steps to get everything that is needed.

  1. Download that APL's javascript runtime, which is shipping with the APL download beginning from version 2018.2. For this blog we downloaded APL version 2101 for SAP HANA 2.0 SPS03 and beyond (Linux on x86_64). In the samples folder of that download you find the files autoRuntime.js and dateCoder.js. In the File Browser of your JupyterLab create a folder called "lib" and copy these 2 files in there.

  2. Install the JavaScript package "amdefine" as explained in the readme.txt that is located in the same folder with the two .js files.


 

Install the package from the Notebook.
!npm install amdefine

 

Now copy and save this JavaScript code into a file called score_json_light.js. This file is very much simplified. It's loading both the APL's runtime as well as the JSON file that represents the trained model. A very simple new observation is then scored. Notice that it is only passing 3 predictors, even though the model contains additional predictor variables. The scoring equation is robust though and can deal with such missing values.
var runtime = require("./lib/autoRuntime");

// Load the model in JSON format
const fs = require("fs");
let rawdata = fs.readFileSync("./bank_marketing_model.json");
let modelDefinition = JSON.parse(rawdata);

// Create scoring engine based on the model's JSON format
autoEngine = runtime.createEngine(modelDefinition);

// New observation to score
row = [
{
variable: "AGE",
value: 40,
},
{
variable: "JOB",
value: "entrepreneur",
},
{
variable: "MARITAL",
value: "married",
},
];

// Score new observation
var t0 = new Date().getTime(); // timer start
const prediction = autoEngine.getScore(row);
var t1 = new Date().getTime(); // timer end
console.log("Prediction: " + prediction["proba"]);
console.log("Inference took " + (t1 - t0) + " milliseconds.");

 

And execute the file.
!node score_json_light.js

 

The JavaScript needed just 22 milliseconds to create the prediction.


 

Even though the code was executed within a Jupyter Notebook, the prediction was carried out purely in JavaScript, without any need to be online or connected to any other infrastructure.

Try it out in your preferred JavaScript environment. Here is an example to score new predictions in Node.js. Copy the files bank_marketing_model.json, score_json_light.js and the lib folder into an empty folder anywhere on your computer.

Install the amdefine package.
npm install amdefine

 

And obtain the prediction.
node score_json_light.js


 

In pure Node.js, without the Jupyter Notebook framework, the same prediction took only 17 milliseconds.

Next steps


You might already have ideas, how this JavaScript scoring can be used in a business process. stojanm gives an excellent and very comprehensive example in his blog, about a project which used this concept to improve b-2-c Marketing communications: MLOps in practice: Applying and updating Machine Learning models in real-time and at scale

At Teched 2020 stojanm and andreas.forster presented and demoed this project, how the JavaScript scoring can be embedded with SAP Data Intelligence and Kafka to personalise a website for more targeted Marketing. The recording is available on Youtube.

Just let us know if you have any questions!
marc.daniau , andreas.forster
1 Comment