import sklearn
Important
Shap values breakdown the prediction to show impact of each values
We can
- Explain Individual Predictions
- Aggregate Model Level Insights
For an example we compute
- Impact| Value v of feature f vs Impact| Value v=baseline of feature f
Where can we use such explainations
- Model of bank rejects someone’s loan application –> Bank is legally required to explain basis of each rejection
- Healthcare provider need to identify what factors are driving each patient’s risk of some disease , so that they can address each of them with targetted intervention
sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values
Imports
sklearn.__version__
'1.5.0'
from aiking.data.external import *
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.inspection import PartialDependenceDisplay
import seaborn as sns
import matplotlib.pyplot as plt
import pdpbox
import graphviz
import panel as pn
from ipywidgets import interact
import shap
= get_ds('fifa-2018-match-statistics'); path.ls()[0] path
Path('/Users/rahul1.saraf/rahuketu/programming/AIKING_HOME/data/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
= pd.read_csv(path/"FIFA 2018 Statistics.csv"); df
df = (df['Man of the Match'] == "Yes"); y
y = df.select_dtypes(np.int64); X
X = train_test_split(X, y, random_state=1)
df_train, df_val, y_train, y_val
df_train.shape, df_val.shape, y_train.shape, y_val.shape
((96, 18), (32, 18), (96,), (32,))
= RandomForestClassifier(random_state=0).fit(df_train, y_train); model_rf model_rf
RandomForestClassifier(random_state=0)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(random_state=0)
= 5
row_to_show = df_val.iloc[row_to_show]
data_for_prediction = data_for_prediction.values.reshape(1,-1); data_for_prediction_array.shape data_for_prediction_array
(1, 18)
model_rf.predict_proba(data_for_prediction_array)
/opt/homebrew/Caskroom/miniforge/base/envs/aiking/lib/python3.9/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but RandomForestClassifier was fitted with feature names
warnings.warn(
array([[0.29, 0.71]])
= shap.TreeExplainer(model=model_rf); explainer explainer
<shap.explainers._tree.TreeExplainer at 0x2a42d3f10>
= explainer.shap_values(data_for_prediction); shap_values shap_values
array([[-0.10282092, 0.10282092],
[ 0.04740467, -0.04740467],
[-0.02983219, 0.02983219],
[-0.02277977, 0.02277977],
[-0.00642731, 0.00642731],
[-0.01258714, 0.01258714],
[-0.02910577, 0.02910577],
[ 0.00766886, -0.00766886],
[-0.00792221, 0.00792221],
[-0.01031725, 0.01031725],
[ 0.00500036, -0.00500036],
[ 0.00094579, -0.00094579],
[ 0.02061101, -0.02061101],
[-0.04846459, 0.04846459],
[-0.00601652, 0.00601652],
[-0.00042073, 0.00042073],
[-0.0008261 , 0.0008261 ],
[-0.01286019, 0.01286019]])
1], shap_values[:,1].shape, data_for_prediction.shape explainer.expected_value[
(0.5012500000000001, (18,), (18,))
shap.initjs()1], shap_values[:,1], data_for_prediction) shap.force_plot(explainer.expected_value[
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Other Explainer
- shap.DeepExplainer works with Deep Learning models.
- shap.KernelExplainer works with all models, though it is slower than other Explainers and it offers an approximation rather than exact Shap values.
= shap.KernelExplainer(model_rf.predict_proba, df_train)
k_explainer = k_explainer.shap_values(data_for_prediction); k_shap_values.shape
k_shap_values 1], k_shap_values[:,1], data_for_prediction) shap.force_plot(k_explainer.expected_value[
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.
Here is an example using KernelExplainer to get similar results. The results aren’t identical because KernelExplainer gives an approximate result. But the results tell the same story.