import sklearn
Shap values breakdown the prediction to show impact of each values
We can
- Explain Individual Predictions
- Aggregate Model Level Insights
For an example we compute
- Impact| Value v of feature f vs Impact| Value v=baseline of feature f
Where can we use such explainations
- Model of bank rejects someone’s loan application –> Bank is legally required to explain basis of each rejection
- Healthcare provider need to identify what factors are driving each patient’s risk of some disease , so that they can address each of them with targetted intervention
sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values
from import *
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.inspection import PartialDependenceDisplay
import seaborn as sns
import matplotlib.pyplot as plt
import pdpbox
import graphviz
import panel as pn
from ipywidgets import interact
import shap
= get_ds('fifa-2018-match-statistics');[0] path
Path('/Users/rahul1.saraf/rahuketu/programming/AIKING_HOME/data/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
= pd.read_csv(path/"FIFA 2018 Statistics.csv"); df
df = (df['Man of the Match'] == "Yes"); y
y = df.select_dtypes(np.int64); X
X = train_test_split(X, y, random_state=1)
df_train, df_val, y_train, y_val
df_train.shape, df_val.shape, y_train.shape, y_val.shape
((96, 18), (32, 18), (96,), (32,))
= RandomForestClassifier(random_state=0).fit(df_train, y_train); model_rf model_rf
RandomForestClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with
= 5
row_to_show = df_val.iloc[row_to_show]
data_for_prediction = data_for_prediction.values.reshape(1,-1); data_for_prediction_array.shape data_for_prediction_array
(1, 18)
/opt/homebrew/Caskroom/miniforge/base/envs/aiking/lib/python3.9/site-packages/sklearn/ UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
array([[0.29, 0.71]])
= shap.TreeExplainer(model=model_rf); explainer explainer
<shap.explainers._tree.TreeExplainer at 0x2a42d3f10>
= explainer.shap_values(data_for_prediction); shap_values shap_values
array([[-0.10282092, 0.10282092],
[ 0.04740467, -0.04740467],
[-0.02983219, 0.02983219],
[-0.02277977, 0.02277977],
[-0.00642731, 0.00642731],
[-0.01258714, 0.01258714],
[-0.02910577, 0.02910577],
[ 0.00766886, -0.00766886],
[-0.00792221, 0.00792221],
[-0.01031725, 0.01031725],
[ 0.00500036, -0.00500036],
[ 0.00094579, -0.00094579],
[ 0.02061101, -0.02061101],
[-0.04846459, 0.04846459],
[-0.00601652, 0.00601652],
[-0.00042073, 0.00042073],
[-0.0008261 , 0.0008261 ],
[-0.01286019, 0.01286019]])
1], shap_values[:,1].shape, data_for_prediction.shape explainer.expected_value[
(0.5012500000000001, (18,), (18,))
shap.initjs()1], shap_values[:,1], data_for_prediction) shap.force_plot(explainer.expected_value[
Other Explainer
- shap.DeepExplainer works with Deep Learning models.
- shap.KernelExplainer works with all models, though it is slower than other Explainers and it offers an approximation rather than exact Shap values.
= shap.KernelExplainer(model_rf.predict_proba, df_train)
k_explainer = k_explainer.shap_values(data_for_prediction); k_shap_values.shape
k_shap_values 1], k_shap_values[:,1], data_for_prediction) shap.force_plot(k_explainer.expected_value[
Here is an example using KernelExplainer to get similar results. The results aren’t identical because KernelExplainer gives an approximate result. But the results tell the same story.