Important
  • Shap values breakdown the prediction to show impact of each values

  • We can

    • Explain Individual Predictions
    • Aggregate Model Level Insights
  • For an example we compute

    • Impact| Value v of feature f vs Impact| Value v=baseline of feature f
  • Where can we use such explainations

    • Model of bank rejects someone’s loan application –> Bank is legally required to explain basis of each rejection
    • Healthcare provider need to identify what factors are driving each patient’s risk of some disease , so that they can address each of them with targetted intervention
  • sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values

Imports

import sklearn
sklearn.__version__
'1.5.0'
from aiking.data.external import *
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.inspection import PartialDependenceDisplay
import seaborn as sns
import matplotlib.pyplot as plt
import pdpbox
import graphviz
import panel as pn
from ipywidgets import interact
import shap
path = get_ds('fifa-2018-match-statistics'); path.ls()[0]
Path('/Users/rahul1.saraf/rahuketu/programming/AIKING_HOME/data/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
df = pd.read_csv(path/"FIFA 2018 Statistics.csv"); df
y = (df['Man of the Match'] == "Yes"); y
X = df.select_dtypes(np.int64); X
df_train, df_val, y_train, y_val = train_test_split(X, y, random_state=1)

df_train.shape, df_val.shape, y_train.shape, y_val.shape
((96, 18), (32, 18), (96,), (32,))
model_rf = RandomForestClassifier(random_state=0).fit(df_train, y_train); model_rf
RandomForestClassifier(random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
row_to_show = 5
data_for_prediction = df_val.iloc[row_to_show]
data_for_prediction_array = data_for_prediction.values.reshape(1,-1); data_for_prediction_array.shape
(1, 18)
model_rf.predict_proba(data_for_prediction_array)
/opt/homebrew/Caskroom/miniforge/base/envs/aiking/lib/python3.9/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
  warnings.warn(
array([[0.29, 0.71]])
explainer = shap.TreeExplainer(model=model_rf); explainer
<shap.explainers._tree.TreeExplainer at 0x2a42d3f10>
shap_values = explainer.shap_values(data_for_prediction); shap_values
array([[-0.10282092,  0.10282092],
       [ 0.04740467, -0.04740467],
       [-0.02983219,  0.02983219],
       [-0.02277977,  0.02277977],
       [-0.00642731,  0.00642731],
       [-0.01258714,  0.01258714],
       [-0.02910577,  0.02910577],
       [ 0.00766886, -0.00766886],
       [-0.00792221,  0.00792221],
       [-0.01031725,  0.01031725],
       [ 0.00500036, -0.00500036],
       [ 0.00094579, -0.00094579],
       [ 0.02061101, -0.02061101],
       [-0.04846459,  0.04846459],
       [-0.00601652,  0.00601652],
       [-0.00042073,  0.00042073],
       [-0.0008261 ,  0.0008261 ],
       [-0.01286019,  0.01286019]])
explainer.expected_value[1], shap_values[:,1].shape, data_for_prediction.shape
(0.5012500000000001, (18,), (18,))
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[:,1], data_for_prediction)
0.10130.20130.30130.40130.50130.60130.70130.80130.9013Blocked = 2Goals in PSO = 3On-Target = 7Corners = 6Attempts = 13Fouls Committed = 25Goal Scored = 2Ball Possession % = 38Distance Covered (Kms) = 148base value0.710.71higherf(x)lower

Other Explainer

  • shap.DeepExplainer works with Deep Learning models.
  • shap.KernelExplainer works with all models, though it is slower than other Explainers and it offers an approximation rather than exact Shap values.
k_explainer = shap.KernelExplainer(model_rf.predict_proba, df_train)
k_shap_values = k_explainer.shap_values(data_for_prediction); k_shap_values.shape
shap.force_plot(k_explainer.expected_value[1], k_shap_values[:,1], data_for_prediction)
0.093330.19330.29330.39330.49330.59330.69330.79330.8933Goals in PSO = 3Attempts = 13Corners = 6On-Target = 7Fouls Committed = 25Goal Scored = 2Ball Possession % = 38Distance Covered (Kms) = 148base value0.710.71higherf(x)lower

Here is an example using KernelExplainer to get similar results. The results aren’t identical because KernelExplainer gives an approximate result. But the results tell the same story.