Important
  • Shap values breakdown the prediction to show impact of each values

  • We can

    • Explain Individual Predictions
    • Aggregate Model Level Insights
  • For an example we compute

    • Impact| Value v of feature f vs Impact| Value v=baseline of feature f
  • Where can we use such explainations

    • Model of bank rejects someone’s loan application –> Bank is legally required to explain basis of each rejection
    • Healthcare provider need to identify what factors are driving each patient’s risk of some disease , so that they can address each of them with targetted intervention
  • sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values

Imports

import sklearn
sklearn.__version__
'1.5.0'
from aiking.data.external import *
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.inspection import PartialDependenceDisplay
import seaborn as sns
import matplotlib.pyplot as plt
import pdpbox
import graphviz
import panel as pn
from ipywidgets import interact
import shap
path = get_ds('fifa-2018-match-statistics'); path.ls()[0]
Path('/Users/rahul1.saraf/rahuketu/programming/AIKING_HOME/data/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
df = pd.read_csv(path/"FIFA 2018 Statistics.csv"); df
y = (df['Man of the Match'] == "Yes"); y
X = df.select_dtypes(np.int64); X
df_train, df_val, y_train, y_val = train_test_split(X, y, random_state=1)

df_train.shape, df_val.shape, y_train.shape, y_val.shape
((96, 18), (32, 18), (96,), (32,))
model_rf = RandomForestClassifier(random_state=0).fit(df_train, y_train); model_rf
RandomForestClassifier(random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
row_to_show = 5
data_for_prediction = df_val.iloc[row_to_show]
data_for_prediction_array = data_for_prediction.values.reshape(1,-1); data_for_prediction_array.shape
(1, 18)
model_rf.predict_proba(data_for_prediction_array)
/opt/homebrew/Caskroom/miniforge/base/envs/aiking/lib/python3.9/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
array([[0.29, 0.71]])
explainer = shap.TreeExplainer(model=model_rf); explainer
<shap.explainers._tree.TreeExplainer at 0x2a42d3f10>
shap_values = explainer.shap_values(data_for_prediction); shap_values
array([[-0.10282092,  0.10282092],
       [ 0.04740467, -0.04740467],
       [-0.02983219,  0.02983219],
       [-0.02277977,  0.02277977],
       [-0.00642731,  0.00642731],
       [-0.01258714,  0.01258714],
       [-0.02910577,  0.02910577],
       [ 0.00766886, -0.00766886],
       [-0.00792221,  0.00792221],
       [-0.01031725,  0.01031725],
       [ 0.00500036, -0.00500036],
       [ 0.00094579, -0.00094579],
       [ 0.02061101, -0.02061101],
       [-0.04846459,  0.04846459],
       [-0.00601652,  0.00601652],
       [-0.00042073,  0.00042073],
       [-0.0008261 ,  0.0008261 ],
       [-0.01286019,  0.01286019]])
explainer.expected_value[1], shap_values[:,1].shape, data_for_prediction.shape
(0.5012500000000001, (18,), (18,))
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[:,1], data_for_prediction)
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Other Explainer

  • shap.DeepExplainer works with Deep Learning models.
  • shap.KernelExplainer works with all models, though it is slower than other Explainers and it offers an approximation rather than exact Shap values.
k_explainer = shap.KernelExplainer(model_rf.predict_proba, df_train)
k_shap_values = k_explainer.shap_values(data_for_prediction); k_shap_values.shape
shap.force_plot(k_explainer.expected_value[1], k_shap_values[:,1], data_for_prediction)
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Here is an example using KernelExplainer to get similar results. The results aren’t identical because KernelExplainer gives an approximate result. But the results tell the same story.