Skip to Content
Author's profile photo Marc DANIAU

SAP HANA Automated Predictive Library (APL) – Model Advanced Apply Settings

When applying a classification model, the user of the Automated Analytics module can choose the prediction information he needs.

To begin, the user goes to the section, Generation Options:

The user may select the option “Decision”, in which case the apply output dataset will include the decision (i.e. the yes/no prediction) and its probability, in addition to the usual columns like the key and the score.

If he selects “Approximated Quantile”, the apply output dataset will be segmented into 10 buckets. By selecting “Individual Contributions”, he will obtain the contribution value for each predictor present in the trained model.

The SAP HANA Automated Predictive Library (APL) provides the same options in the form of alias parameters that a SQL developer can use in its script:

insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Decision',null);
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Quantiles',null); 
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Individual Contributions',null);

Sometimes a custom output is required. This is when the user will select the Advanced Apply Settings option. Let’s look at a few use cases of Advanced Apply Settings, and see how it works using APL.


Use Case A

We are interested in the quantile segmentation, but instead of 10 buckets, we would like 100 buckets, or only 4 buckets.

To specify a custom quantile segmentation, we must use, in the SQL script, a full path parameter with the desired number of buckets:

insert into APPLY_CONFIG values 
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/PredictedQuantile', '4', null);

Note that the path contains the name of the target variable; in the above example: IS_FRAUD.


Use Case B

We want the decision information together with the individual contributions.

If we put in our script the two alias parameters (we saw earlier) for decision and contributions, only the first one will be taken into account at run time. This is because they are mutually exclusive preset settings. Again, we need to use the full path of the parameters.

However, in order for the apply model command to succeed, we must make sure that we create the proper output table that corresponds to the chosen options. We need to know in advance the number of columns that will be produced, their order, their names and data types. The APL function get_apply_table_type is here for that purpose.

The following is a script with that APL function, after we trained a classification model against historical automobile insurance claims:

drop table FUNC_HEADER;
create table FUNC_HEADER like "SAP_PA_APL"."";;
insert into FUNC_HEADER values ('Oid', '#42');
insert into FUNC_HEADER values ('LogLevel', '8');
drop table APPLY_CONFIG;  
create table APPLY_CONFIG like "SAP_PA_APL"."";
insert into APPLY_CONFIG values   -- decision_rr_IS_FRAUD
insert into APPLY_CONFIG values   -- proba_decision_rr_IS_FRAUD
insert into APPLY_CONFIG values   -- Individual contribution of each predictor
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/Contribution', 'all',null); 
drop table SCHEMA_OUT;
create table SCHEMA_OUT like "SAP_PA_APL"."";
drop table SCHEMA_LOG;
create table SCHEMA_LOG like "SAP_PA_APL"."";
call "SAP_PA_APL"."" (
) with overview;

By running that script, we populate the schema_out table with the expected output structure. One can read the content of that table and write the create table SQL statement accordingly, or generate the DDL script automatically using this sample query:

SELECT  'drop table "CLAIMS_SCORES";' as "/*  DDL to Run Before Model Apply  */" FROM DUMMY 
SELECT  'create column table "CLAIMS_SCORES" ('  FROM DUMMY 
  Case When POSITION = 0 Then ' ' Else ', ' End || '"' || NAME || '" ' || 
  Case KIND When 'String' Then 'nvarchar(80)' When 'BigInt' then 'Integer' Else KIND End || ' '

Here are the results, ready for copy/paste, in SAP HANA Studio:


Use Case C

We want the decision information, as well as the reason codes and the outlier flag.

That one is straightforward. It consists of using the three following APL alias parameters.

insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Decision',null);
insert into APPLY_CONFIG values ('APL/ApplyReasonCode','3;Mean;Below',null);
insert into APPLY_CONFIG values ('APL/ApplyOutlierFlag','true',null);

To know the resulting prediction columns, we use the function get_apply_table_type, the same way we did in Use Case B.  Below is the automatically generated script for creating the output table, which stores the requested prediction information, still based on the fraud detection scenario for automobile insurance claims:

/*  DDL to Run Before Model Apply  */
drop table "CLAIMS_SCORES";
create column table "CLAIMS_SCORES" (
 "CLAIM_ID" nvarchar(80) 
, "IS_FRAUD" nvarchar(80) 
, "rr_IS_FRAUD" Double 
, "decision_rr_IS_FRAUD" nvarchar(80) 
, "proba_decision_rr_IS_FRAUD" Double 
, "outlier_rr_IS_FRAUD" Integer 
, "RCN_B_Mean_1_rr_IS_FRAUD" nvarchar(80) 
, "RCN_B_Mean_2_rr_IS_FRAUD" nvarchar(80) 
, "RCN_B_Mean_3_rr_IS_FRAUD" nvarchar(80) 
, "RCV_B_Mean_1_rr_IS_FRAUD" nvarchar(80) 
, "RCV_B_Mean_2_rr_IS_FRAUD" nvarchar(80) 
, "RCV_B_Mean_3_rr_IS_FRAUD" nvarchar(80) 


You can find the documentation of the APL Apply function here

Assigned Tags

      Be the first to leave a comment
      You must be Logged on to comment or reply to a post.