SHAP Interaction values with Automated Predictive (APL)
We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart Variable Importance in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.
This blog will walk you through an example using the Census dataset that comes with APL.
from hana_ml import dataframe as hd conn = hd.ConnectionContext(userkey='MLMDA_KEY') sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1' hdf_train = hd.DataFrame(conn, sql_cmd)
First, we train a gradient boosting classification model with the interaction parameter set to true:
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, interactions=True) apl_model.fit(hdf_train, label='class', key='id')
When the model training is completed, we ask for the report:
from hana_ml.visualizers.unified_report import UnifiedReport UnifiedReport(apl_model).build().display()
You may want to generate the report as an HTML file:
The usual “Variable Importance” tab provides a global explanation of the predictive model.
But because we explicitly requested the interactions when setting the model parameters, a new tab “Interaction Matrix” appears at the end:
On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6×6. For a larger matrix, 9×9 for example, we must specify a maximum number as follows:
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True, interactions=True, interactions_max_kept=8) apl_model.fit(hdf_train, label='class', key='id')
The larger the matrix, the longer it takes to fit the model.
If needed, one can obtain the interaction values in a pandas dataframe:
df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect() df.style.hide(axis='index')
These figures are computed using the Shapley Taylor index.