import sklearn
Important
Shap values breakdown the prediction to show impact of each values
We can
- Explain Individual Predictions
- Aggregate Model Level Insights
For an example we compute
- Impact| Value v of feature f vs Impact| Value v=baseline of feature f
Where can we use such explainations
- Model of bank rejects someone’s loan application –> Bank is legally required to explain basis of each rejection
- Healthcare provider need to identify what factors are driving each patient’s risk of some disease , so that they can address each of them with targetted intervention
sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values
Imports
sklearn.__version__'1.5.0'
from aiking.data.external import *
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.inspection import PartialDependenceDisplay
import seaborn as sns
import matplotlib.pyplot as plt
import pdpbox
import graphviz
import panel as pn
from ipywidgets import interact
import shappath = get_ds('fifa-2018-match-statistics'); path.ls()[0]Path('/Users/rahul1.saraf/rahuketu/programming/AIKING_HOME/data/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
df = pd.read_csv(path/"FIFA 2018 Statistics.csv"); df
y = (df['Man of the Match'] == "Yes"); y
X = df.select_dtypes(np.int64); X
df_train, df_val, y_train, y_val = train_test_split(X, y, random_state=1)
df_train.shape, df_val.shape, y_train.shape, y_val.shape((96, 18), (32, 18), (96,), (32,))
model_rf = RandomForestClassifier(random_state=0).fit(df_train, y_train); model_rfRandomForestClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=0)
row_to_show = 5
data_for_prediction = df_val.iloc[row_to_show]
data_for_prediction_array = data_for_prediction.values.reshape(1,-1); data_for_prediction_array.shape(1, 18)
model_rf.predict_proba(data_for_prediction_array)/opt/homebrew/Caskroom/miniforge/base/envs/aiking/lib/python3.9/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
warnings.warn(
array([[0.29, 0.71]])
explainer = shap.TreeExplainer(model=model_rf); explainer<shap.explainers._tree.TreeExplainer at 0x2a42d3f10>
shap_values = explainer.shap_values(data_for_prediction); shap_valuesarray([[-0.10282092, 0.10282092],
[ 0.04740467, -0.04740467],
[-0.02983219, 0.02983219],
[-0.02277977, 0.02277977],
[-0.00642731, 0.00642731],
[-0.01258714, 0.01258714],
[-0.02910577, 0.02910577],
[ 0.00766886, -0.00766886],
[-0.00792221, 0.00792221],
[-0.01031725, 0.01031725],
[ 0.00500036, -0.00500036],
[ 0.00094579, -0.00094579],
[ 0.02061101, -0.02061101],
[-0.04846459, 0.04846459],
[-0.00601652, 0.00601652],
[-0.00042073, 0.00042073],
[-0.0008261 , 0.0008261 ],
[-0.01286019, 0.01286019]])
explainer.expected_value[1], shap_values[:,1].shape, data_for_prediction.shape(0.5012500000000001, (18,), (18,))
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[:,1], data_for_prediction)Other Explainer
- shap.DeepExplainer works with Deep Learning models.
- shap.KernelExplainer works with all models, though it is slower than other Explainers and it offers an approximation rather than exact Shap values.
k_explainer = shap.KernelExplainer(model_rf.predict_proba, df_train)
k_shap_values = k_explainer.shap_values(data_for_prediction); k_shap_values.shape
shap.force_plot(k_explainer.expected_value[1], k_shap_values[:,1], data_for_prediction)Here is an example using KernelExplainer to get similar results. The results aren’t identical because KernelExplainer gives an approximate result. But the results tell the same story.