Deployment2: Pandemic Safety Retraining and Final Model

This notebook demonstrates ML driven data cleaning, model export and setups the stage for deployment

Open In Colab

Compatibility Block

Check Platform

Platform & Environment Configuration

Imports

Public Imports

from fastai.vision.all import *
from fastcore.all import *
from functools import partial
import warnings
warnings.filterwarnings("ignore")

Private Imports

from aiking.data.external import *
from aiking.core import aiking_settings
from aiking.integrations.datasette import *
from aiking.dl.widgets import PersistentImageClassifierCleaner

Pandemic Safety

Our dataset consist of images labelled as mask or no_mask. We will do following steps

  1. Build a classifier on raw version of data
  2. Review images with highest confusion. Identify images we would, like to keep, relabel or skip
dsname = 'PandemicSafety'
datasette_base_url = "https://datasette.zealmaker.com"
path = data_frm_datasette(dsname, datasette_base_url); path
Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety')

DataBlocks and DataLoaders

In present example, I will use image.csv to generate items and labels. Advantages are:-

  • I don’t have to delete any file
  • I can manage modifications to dataset in a new csv file and upload the version on Datasette for future review
  • dataloaders has a path argument. To be consistent with fastai dataloader we need to supply path for df execution

We will get list of fnames and labels from dataframe. If we require cleaning we will save any data modifications in new csv which we can then upload to datasette. This way our data stays immutable.

df = get_image_df(path, csvfile='cleaned_v1.csv', skip_col='skipped')
get_images_from_df(path, df=df)
(#295) [Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/7a8b0909-4f5c-4d53-b1bc-f35a83a022c9.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/1f54d9ac-4a0e-42fe-98cb-09938a3104b0.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/d08a498f-27cf-4668-a4f1-af32dfd15416.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/52601346-132e-4217-ab57-6c324c1e4eee.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/78ac55fb-2db3-48e9-bdcc-4caa2a50e3f6.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/5eeec12d-174e-4d1d-a261-8dcd99ea9b8b.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/f048f38a-3758-428d-8d9f-8462a4e272d0.jpeg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/cfbe3224-4ec8-47f6-8ee8-d637572b2787.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/b2b14d80-ddf9-4e4a-9b61-afdbabb1cf9d.jpg'),Path('/mnt/d/rahuketu/programming/AIKING_HOME/data/PandemicSafety/No_Mask/d7a7b4e3-4756-4294-88d6-6c23a7ba6741.jpg')...]
dls = ImageDataLoaders.from_lists(path, 
                            fnames=get_images_from_df(path, df=df), 
                            labels=df['new_label'].values.tolist(), 
                            valid_pct=0.5,
                            item_tfms=[Resize(192, method='squish')])

dls.valid.show_batch(max_n=6)

Model Training

learn = vision_learner(dls, resnet18, metrics=[error_rate, accuracy]); learn
<fastai.learner.Learner at 0x7f2c1528a680>
learn.fine_tune(8)
epoch train_loss valid_loss error_rate accuracy time
0 1.328507 0.850975 0.394558 0.605442 00:22
epoch train_loss valid_loss error_rate accuracy time
0 0.775933 0.833455 0.346939 0.653061 00:20
1 0.604371 0.843556 0.312925 0.687075 00:20
2 0.469399 0.796075 0.238095 0.761905 00:20
3 0.370113 0.634478 0.190476 0.809524 00:19
4 0.303289 0.486864 0.163265 0.836735 00:22
5 0.252816 0.415158 0.142857 0.857143 00:20
6 0.213483 0.378811 0.136054 0.863946 00:20
7 0.184801 0.343558 0.136054 0.863946 00:20

Classification Interpretation

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(k=20)

Where do we see high losses?
  • When model predicts incorrect class with high confidence.
  • When model predicts correct class but the confidence low.

Model and Cleaned Data Saving

learn.export(aiking_path('model')/"pandemic_v2.pkl")