###### Technical Articles

# SHAP Interaction values with Automated Predictive (APL)

We already covered SHAP-explained models for classification and regression scenarios in a previous APL blog post, and at the time we talked briefly about the main effect of a predictor and its interaction effect with the other predictors of the model. Now with HANA ML 2.17, you have the ability to visualize the interaction between variables in a heatmap. This new visualization adds to the bar chart *Variable Importance* in providing a global explanation of the classification/regression model. To get that new feature you need APL 2311 or a later version.

This blog will walk you through an example using the Census dataset that comes with APL.

```
from hana_ml import dataframe as hd
conn = hd.ConnectionContext(userkey='MLMDA_KEY')
sql_cmd = 'SELECT * FROM "APL_SAMPLES"."CENSUS" ORDER BY 1'
hdf_train = hd.DataFrame(conn, sql_cmd)
```

First, we train a gradient boosting classification model with the interaction parameter set to true:

```
from hana_ml.algorithms.apl.gradient_boosting_classification import GradientBoostingBinaryClassifier
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True,
interactions=True)
apl_model.fit(hdf_train, label='class', key='id')
```

When the model training is completed, we ask for the report:

```
from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(apl_model).build().display()
```

You may want to generate the report as an HTML file:

`apl_model.generate_html_report('APL_Census')`

The usual “Variable Importance” tab provides a global explanation of the predictive model.

But because we explicitly requested the interactions when setting the model parameters, a new tab “Interaction Matrix” appears at the end:

On the diagonal is the main effect of each variable. The interaction matrix presents only the variables with the highest interactions. By default, it is limited to a size of 6×6. For a larger matrix, 9×9 for example, we must specify a maximum number as follows:

```
apl_model = GradientBoostingBinaryClassifier(variable_auto_selection=True,
interactions=True,
interactions_max_kept=8)
apl_model.fit(hdf_train, label='class', key='id')
```

The larger the matrix, the longer it takes to fit the model.

If needed, one can obtain the interaction values in a pandas dataframe:

```
df = apl_model.get_debrief_report('ClassificationRegression_InteractionMatrix').deselect('Oid').collect()
df.style.hide(axis='index')
```

These figures are computed using the Shapley Taylor index.

Dear Marc, I like it. It is more convenient to have tabs on top than on the left. Regards, Sergiu