SAP BusinessObjects Predictive Analytics 3.1 enables you to generate the scoring formula of a predictive model as a user-defined function (UDF) in SAP HANA. In this blog, we will look closer at this new feature to see how it works.
Beforehand, we have trained a classification model against fraud data regarding automobile insurance claims. Now we take the following steps:
1 – In Automated Analytics, we open Using the Model — Save/Export — Generate Source Code:
Note that the above menu is available only for classification, regression and clustering models.
2- We choose the code type, UDF for SAP HANA. After which, we name the generated file and click Generate:
3 – We take a look at the file in a text editor:
Line #4 shows in comment how to use the UDF in a SELECT statement. However, before we can execute the UDF we must create it in the database, as follows.
4 – We copy the create function SQL statement (from line #8 to the end of the file), paste it in SAP HANA Studio, and run it:
The function created is a scalar UDF that appears in the database catalog:
5 – We try it in an SQL SELECT statement against a table with new claims:
select *, APL_SAMPLES.CLAIMS_FRAUD_SCORE ( CAST(DAYS_TO_REPORT AS INTEGER), CAST(BODILY_INJURY_AMOUNT AS INTEGER), PAYMENT_METHOD, CAST(AGE AS INTEGER), GENDER,INCOME_ESTIMATE, CAST(INCOME_CATEGORY AS INTEGER) ) AS IS_FRAUD_SCORE from APL_SAMPLES.AUTO_CLAIMS_NEW order by IS_FRAUD_SCORE desc
In our query, we sorted the claims on their scores in descending order. Claims with the highest risk of being fraudulent appear first on the result list:
6 – We want to use the scoring UDF in a calculation view. The database has an existing graphical calculation view, defined against a table containing new claims:
We create a scripted calculation view to augment the graphical view with prediction information like the score:
We could define another column to turn the score into a Yes/No decision using a case statement:
CASE WHEN APL_SAMPLES.CLAIMS_FRAUD_SCORE( CAST(DAYS_TO_REPORT AS INTEGER), CAST(BODILY_INJURY_AMOUNT AS INTEGER), PAYMENT_METHOD, CAST(AGE AS INTEGER), GENDER,INCOME_ESTIMATE, CAST(INCOME_CATEGORY AS INTEGER) ) > 0.605 THEN 1 ELSE 0 END AS IS_FRAUD_DECISION
The threshold (0.605) used in the expression comes from the confusion matrix of the classification model.
Below is the permission granted to the user _SYS_REPO so that the UDF can be invoked in a calculation view:
GRANT EXECUTE ON CLAIMS_FRAUD_SCORE TO _SYS_REPO WITH GRANT OPTION;
Note that invoking the UDF in a calculation view is just one way of using it. You can also invoke the UDF from ABAP code or from the HANA analysis process (HAP).
This new feature allows to compute scores with BW-on-HANA and BW/4HANA.
Now you are ready to try the UDF yourself.