Technology Blogs by SAP
Learn how to extend and personalize SAP applications. Follow the SAP technology blog for insights into SAP BTP, ABAP, SAP Analytics Cloud, SAP HANA, and more.
cancel
Showing results for 
Search instead for 
Did you mean: 
marc_daniau
Advisor
Advisor
When applying a classification model, the user of the Automated Analytics module can choose the prediction information he needs.

To begin, the user goes to the section, Generation Options:



The user may select the option “Decision”, in which case the apply output dataset will include the decision (i.e. the yes/no prediction) and its probability, in addition to the usual columns like the key and the score.

If he selects “Approximated Quantile”, the apply output dataset will be segmented into 10 buckets. By selecting “Individual Contributions”, he will obtain the contribution value for each predictor present in the trained model.

The SAP HANA Automated Predictive Library (APL) provides the same options in the form of alias parameters that a SQL developer can use in its script:
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Decision',null);
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Quantiles',null);
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Individual Contributions',null);

Sometimes a custom output is required. This is when the user will select the Advanced Apply Settings option. Let’s look at a few use cases of Advanced Apply Settings, and see how it works using APL.

 

Use Case A

We are interested in the quantile segmentation, but instead of 10 buckets, we would like 100 buckets, or only 4 buckets.

To specify a custom quantile segmentation, we must use, in the SQL script, a full path parameter with the desired number of buckets:
insert into APPLY_CONFIG values 
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/PredictedQuantile', '4', null);

Note that the path contains the name of the target variable; in the above example: IS_FRAUD.

 

Use Case B

We want the decision information together with the individual contributions.

If we put in our script the two alias parameters (we saw earlier) for decision and contributions, only the first one will be taken into account at run time. This is because they are mutually exclusive preset settings. Again, we need to use the full path of the parameters.

However, in order for the apply model command to succeed, we must make sure that we create the proper output table that corresponds to the chosen options. We need to know in advance the number of columns that will be produced, their order, their names and data types. The APL function get_apply_table_type is here for that purpose.

The following is a script with that APL function, after we trained a classification model against historical automobile insurance claims:
/*  PREPARE INPUT TABLES  */
drop table FUNC_HEADER;
create table FUNC_HEADER like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.FUNCTION_HEADER";;
insert into FUNC_HEADER values ('Oid', '#42');
insert into FUNC_HEADER values ('LogLevel', '8');
drop table APPLY_CONFIG;
create table APPLY_CONFIG like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_CONFIG_DETAILED";
insert into APPLY_CONFIG values -- decision_rr_IS_FRAUD
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/PredictedRankCategories/1','',null);
insert into APPLY_CONFIG values -- proba_decision_rr_IS_FRAUD
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/PredictedRankProbabilities/1','',null);
insert into APPLY_CONFIG values -- Individual contribution of each predictor
('Protocols/Default/Transforms/Kxen.RobustRegression/Parameters/ApplySettings/Supervised/IS_FRAUD/Contribution', 'all',null);
/* PREPARE OUTPUT TABLES */
drop table SCHEMA_OUT;
create table SCHEMA_OUT like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.TABLE_TYPE";
drop table SCHEMA_LOG;
create table SCHEMA_LOG like "SAP_PA_APL"."sap.pa.apl.base::BASE.T.OPERATION_LOG";
/* RUN THE FUNCTION */
call "SAP_PA_APL"."sap.pa.apl.base::GET_TABLE_TYPE_FOR_APPLY" (
FUNC_HEADER, MODEL_TRAIN_BIN, APPLY_CONFIG,
'APL_SAMPLES','AUTO_CLAIMS_NEW',
SCHEMA_OUT , SCHEMA_LOG
) with overview;

By running that script, we populate the schema_out table with the expected output structure. One can read the content of that table and write the create table SQL statement accordingly, or generate the DDL script automatically using this sample query:
SELECT  'drop table "CLAIMS_SCORES";' as "/*  DDL to Run Before Model Apply  */" FROM DUMMY 
UNION
SELECT 'create column table "CLAIMS_SCORES" (' FROM DUMMY
UNION
(
SELECT
Case When POSITION = 0 Then ' ' Else ', ' End || '"' || NAME || '" ' ||
Case KIND When 'String' Then 'nvarchar(80)' When 'BigInt' then 'Integer' Else KIND End || ' '
FROM "USER_APL"."SCHEMA_OUT" ORDER BY POSITION
)
UNION
SELECT ');' FROM DUMMY;

Here are the results, ready for copy/paste, in SAP HANA Studio:



 

Use Case C

We want the decision information, as well as the reason codes and the outlier flag.

That one is straightforward. It consists of using the three following APL alias parameters.
insert into APPLY_CONFIG values ('APL/ApplyExtraMode','Decision',null);
insert into APPLY_CONFIG values ('APL/ApplyReasonCode','3;Mean;Below',null);
insert into APPLY_CONFIG values ('APL/ApplyOutlierFlag','true',null);

To know the resulting prediction columns, we use the function get_apply_table_type, the same way we did in Use Case B.  Below is the automatically generated script for creating the output table, which stores the requested prediction information, still based on the fraud detection scenario for automobile insurance claims:
/*  DDL to Run Before Model Apply  */
drop table "CLAIMS_SCORES";
create column table "CLAIMS_SCORES" (
"CLAIM_ID" nvarchar(80)
, "IS_FRAUD" nvarchar(80)
, "rr_IS_FRAUD" Double
, "decision_rr_IS_FRAUD" nvarchar(80)
, "proba_decision_rr_IS_FRAUD" Double
, "outlier_rr_IS_FRAUD" Integer
, "RCN_B_Mean_1_rr_IS_FRAUD" nvarchar(80)
, "RCN_B_Mean_2_rr_IS_FRAUD" nvarchar(80)
, "RCN_B_Mean_3_rr_IS_FRAUD" nvarchar(80)
, "RCV_B_Mean_1_rr_IS_FRAUD" nvarchar(80)
, "RCV_B_Mean_2_rr_IS_FRAUD" nvarchar(80)
, "RCV_B_Mean_3_rr_IS_FRAUD" nvarchar(80)
);

 

You can find the documentation of the APL Apply function here