SHAP Interaction values with Automated Predictive ...

marc_daniau · ‎06-23-2023

We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.

from hana_ml import dataframe as hd

conn = hd.ConnectionContext(userkey='MLMDA_KEY')

sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'

hdf_train = hd.DataFrame(conn, sql_cmd)

First, we train a gradient boosting classification model with the interaction parameter set to true:

from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier

apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 

                                             interactions=True)

apl_model.fit(hdf_train, label='class', key='id')

When the model training is completed, we ask for the report:

from hana_ml.visualizers.unified_report import UnifiedReport

UnifiedReport(apl_model).build().display()

You may want to generate the report as an HTML file:

apl_model.generate_html_report('APL_Census')

The usual "Variable Importance" tab provides a global explanation of the predictive model.

But because we explicitly requested the interactions when setting the model parameters, a new tab "Interaction Matrix" appears at the end:

On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6x6. For a larger matrix, 9x9 for example, we must specify a maximum number as follows:

apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, 

                                             interactions=True, 

                                             interactions_max_kept=8)

apl_model.fit(hdf_train, label='class', key='id')

The larger the matrix, the longer it takes to fit the model.

If needed, one can obtain the interaction values in a pandas dataframe:

df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()

df.style.hide(axis='index')

These figures are computed using the Shapley Taylor index.

To know more about APL

SHAP Interaction values with Automated Predictive (APL)

Get Your SAP HANA Idea Incubator Badge Today!

SCN Mission - SAP HANA Quiz Challenge is now retired

Share your #HANAStory and Win

Share your HANA story – Win 4 Tickets to SAP d-code(formerly SAP TechEd)

Challenge Submission (SAP Purchase Order - Intelligent assistant)

Steampunk is going all-in

SAP (HANA) Cheat Sheet

DataGenius: Challenge Accepted

Getting Started with the ABAP RESTful Application Programming Model (RAP)

Great Infographic to explain SAP Business Technology Platform (2023 Update!)