Improving Kaggle Private Score with Multi-Target Classification

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I apply Jeremy Howard’s approach to multi-target classification in fastai to improve a Kaggle submission score.

Author

Vishal Bakshi

Published

May 15, 2024

Background

In this notebook, I’ll use the code from Jeremy’s Road to the Top, Part 4 notebook to train a model that classifies both the disease and the variety of the rice paddy. In the fastai course Part 1 Lesson 7 video, Jeremy encourages viewers/students to see how this model scores and to explore the inputs and outputs in order to understand how the model behaves. I’ll do just that in this notebook.

::: {#cell-3 .cell _kg_hide-input=‘true’ _kg_hide-output=‘true’ quarto-private-1=‘{“key”:“execution”,“value”:{“iopub.execute_input”:“2024-05-16T00:04:59.838201Z”,“iopub.status.busy”:“2024-05-16T00:04:59.837710Z”,“iopub.status.idle”:“2024-05-16T00:05:32.536710Z”,“shell.execute_reply”:“2024-05-16T00:05:32.535138Z”,“shell.execute_reply.started”:“2024-05-16T00:04:59.838095Z”}}’ trusted=‘true’ execution_count=1}

import os
iskaggle = os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')

if iskaggle:
    !pip install -Uqq fastcore fastai timm

import timm

from fastai.vision.widgets import *
from fastai.vision.all import *

# install fastkaggle if not available
try: import fastkaggle
except ModuleNotFoundError:
    !pip install -Uqq fastkaggle

from fastkaggle import *
from fastcore.all import *
from fastdownload import download_url

:::

Multi-output `DataLoader`

::: {#cell-5 .cell _kg_hide-input=‘false’ _kg_hide-output=‘true’ quarto-private-1=‘{“key”:“execution”,“value”:{“iopub.execute_input”:“2024-05-16T00:05:32.540740Z”,“iopub.status.busy”:“2024-05-16T00:05:32.539503Z”,“iopub.status.idle”:“2024-05-16T00:05:42.792866Z”,“shell.execute_reply”:“2024-05-16T00:05:42.791792Z”,“shell.execute_reply.started”:“2024-05-16T00:05:32.540695Z”}}’ trusted=‘true’ execution_count=2}

comp = 'paddy-disease-classification'
path = setup_comp(comp, install='fastai "timm>=0.6.2.dev0"')
from fastai.vision.all import *
set_seed(42)

from fastcore.parallel import *
trn_path = path/'train_images'

:::

df = pd.read_csv(path/'train.csv', index_col='image_id')
df.head()

	label	variety	age
image_id
100330.jpg	bacterial_leaf_blight	ADT45	45
100365.jpg	bacterial_leaf_blight	ADT45	45
100382.jpg	bacterial_leaf_blight	ADT45	45
100632.jpg	bacterial_leaf_blight	ADT45	45
101918.jpg	bacterial_leaf_blight	ADT45	45

There are 10 unique labels (including normal) and 10 unique variety values. This means the model will have to predict 10 + 10 = 20 different probabilities.

df['label'].unique().shape, df['variety'].unique().shape

((10,), (10,))

Jeremy creates a get_variety helper function which returns the variety column value for a given image path. Note that when he created df, he passes index_col='image_id' in order to make the index of that DataFrame the image path for easier lookup.

def get_variety(p): return df.loc[p.name, 'variety']

get_variety(Path('100330.jpg')) == 'ADT45'

True

Jeremy’s DataBlock consists of three blocks—one ImageBlock that processes the inputs and two CategoryBlocks, one per target (label and variety). Because there are three blocks we have to specify that the number of inputs, n_inp is 1. Note that we can specify a list of get_y getters, one for each target.

dls = DataBlock(
    blocks=(ImageBlock,CategoryBlock,CategoryBlock),
    n_inp=1,
    get_items=get_image_files,
    get_y = [parent_label,get_variety],
    splitter=RandomSplitter(0.2, seed=42),
    item_tfms=Resize(192, method='squish'),
    batch_tfms=aug_transforms(size=128, min_scale=0.75)
).dataloaders(trn_path)

dls.show_batch(max_n=6)

As done in his notebook, I’ll first test this approach by training a single-target classifier for disease label. Since there are three blocks, the loss function and metrics will receive three things: the predictions (inp), the disease labels and the variety labels.

Single-target Model with Multi-output `DataLoaders`

error_rate??

Signature: error_rate(inp, targ, axis=-1)
Source:   
def error_rate(inp, targ, axis=-1):
    "1 - `accuracy`"
    return 1 - accuracy(inp, targ, axis=axis)
File:      /opt/conda/lib/python3.7/site-packages/fastai/metrics.py
Type:      function

def disease_err(inp,disease,variety): return error_rate(inp,disease)
def disease_loss(inp,disease,variety): return F.cross_entropy(inp,disease)

learn = vision_learner(dls, resnet34, loss_func=disease_loss, metrics=disease_err, n_out=10).to_fp16()
lr = 0.01

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth

learn.fine_tune(12, lr)

epoch	train_loss	valid_loss	disease_err	time
0	1.950858	1.315877	0.420471	01:15

epoch	train_loss	valid_loss	disease_err	time
0	0.804355	0.466699	0.148967	01:06
1	0.602656	0.520798	0.152811	01:07
2	0.535991	0.533135	0.146084	01:06
3	0.499837	0.413230	0.125420	01:06
4	0.374249	0.522707	0.145123	01:07
5	0.303674	0.249570	0.074003	01:07
6	0.233556	0.222586	0.061989	01:07
7	0.162041	0.166682	0.044690	01:06
8	0.123177	0.137297	0.036521	01:06
9	0.080774	0.139264	0.034599	01:06
10	0.051097	0.124689	0.031235	01:06
11	0.054974	0.123993	0.032196	01:07

learn.recorder.plot_loss()

The outputs of this model consist of 10 predictions for each image:

probs = learn.tta(dl=learn.dls.valid)

probs[0].shape

torch.Size([2081, 10])

Here’s what the activations of the models look like:

probs[0][:5]

tensor([[ 2.8325, -7.3865, -3.3020,  8.6670, -3.5552, -5.3201,  0.5134,  0.8277,
         -1.1257,  1.3131],
        [-0.3330, -4.0044, -4.1958,  0.3577, -2.9964, -3.6226, -0.7052,  3.2369,
          9.9019, -0.7284],
        [-1.3333, -3.0190, -4.2167, -2.3115, -2.1287, -2.2457, -1.8324, -2.1084,
         -1.1698, 13.2637],
        [ 4.0613, 16.8584, -3.1682, -1.8026, -0.4133, -5.2385,  0.7230, -3.3894,
         -7.2209, -2.8204],
        [-1.5410, -0.0458,  1.3879, -0.5194, -2.3740, 13.3403, -1.4106, -4.4908,
         -1.3759, -1.7310]])

Most of the output activations are between -5 and +5:

plt.hist(probs[0].flatten().detach().numpy());

The second object returned by tta is a tuple where the first tensor is the target label for disease and the second tensor is the target variety.

probs[0].argmax(dim=1)

tensor([3, 8, 9,  ..., 9, 5, 1])

The error rate for the TTA predictions on the validation set is 0.025, similar to what Jeremy had.

1 - (probs[0].argmax(dim=1) == probs[1][0]).float().mean()

tensor(0.0250)

I’ll now create the test DataLoaders using the test set (making sure to sort the files so they are in the same order as the sample submission CSV):

tst_files = get_image_files(path/'test_images')
tst_files.sort()
tst_files[:5]

(#5) [Path('../input/paddy-disease-classification/test_images/200001.jpg'),Path('../input/paddy-disease-classification/test_images/200002.jpg'),Path('../input/paddy-disease-classification/test_images/200003.jpg'),Path('../input/paddy-disease-classification/test_images/200004.jpg'),Path('../input/paddy-disease-classification/test_images/200005.jpg')]

Then I’ll calculate the TTA predictions on the test set:

tst_dl = dls.test_dl(tst_files)
probs = learn.tta(dl=tst_dl)

len(tst_files), probs[0].shape

(3469, torch.Size([3469, 10]))

Most of the activations are between -10 and +10.

plt.hist(probs[0].flatten().detach().numpy());

Here is the distribution of classes (index of maximum activation per image):

plt.hist(probs[0].argmax(dim=1).flatten().detach().numpy());

I’ll export this as a CSV so I can submit it to Kaggle for scoring.

# get the index (class) of the maximum prediction for each item
idxs = probs[0].argmax(dim=1)
idxs

tensor([7, 8, 3,  ..., 8, 1, 5])

The vocab contains two sets of labels—one for disease and one for variety. I only want to map the disease labels for now.

dls.vocab[0]

['bacterial_leaf_blight', 'bacterial_leaf_streak', 'bacterial_panicle_blight', 'blast', 'brown_spot', 'dead_heart', 'downy_mildew', 'hispa', 'normal', 'tungro']

# convert indexes to vocab strings
mapping = dict(enumerate(dls.vocab[0]))
mapping

{0: 'bacterial_leaf_blight',
 1: 'bacterial_leaf_streak',
 2: 'bacterial_panicle_blight',
 3: 'blast',
 4: 'brown_spot',
 5: 'dead_heart',
 6: 'downy_mildew',
 7: 'hispa',
 8: 'normal',
 9: 'tungro'}

# add vocab strings to sample submission file and export to CSV
ss = pd.read_csv(path/'sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('subm1.csv', index=False)

!head subm1.csv

image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

This submission resulted in a Private score of 0.97580.

Single-target Model with Single-output `DataLoaders`

As another comparison/baseline, I’ll train a single-target disease classifier using a single-output DataLoaders object.

dls = DataBlock(
    blocks=(ImageBlock,CategoryBlock),
    get_items=get_image_files,
    get_y = parent_label,
    splitter=RandomSplitter(0.2, seed=42),
    item_tfms=Resize(192, method='squish'),
    batch_tfms=aug_transforms(size=128, min_scale=0.75)
).dataloaders(trn_path)

dls.show_batch(max_n=6)

learn = vision_learner(dls, resnet34, metrics=error_rate).to_fp16()
learn.fine_tune(12, 0.01)

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth

epoch	train_loss	valid_loss	error_rate	time
0	1.941936	1.325037	0.419990	01:16

epoch	train_loss	valid_loss	error_rate	time
0	0.808069	0.461713	0.158097	01:06
1	0.592128	0.562304	0.172033	01:06
2	0.553166	0.526510	0.142239	01:06
3	0.499870	0.468296	0.137914	01:06
4	0.365323	0.344052	0.095627	00:58
5	0.315392	0.372406	0.105718	00:51
6	0.245668	0.210443	0.062470	00:51
7	0.165351	0.178430	0.047093	00:51
8	0.112827	0.153295	0.038924	00:50
9	0.079190	0.144607	0.033638	00:51
10	0.048934	0.128548	0.029313	00:51
11	0.046523	0.129133	0.027871	00:50

probs = learn.tta(dl=learn.dls.valid)

probs[0].shape, probs[1].shape

(torch.Size([2081, 10]), torch.Size([2081]))

This model has a slightly better TTA error rate on the validation set.

1 - (probs[0].argmax(dim=1) == probs[1]).float().mean()

tensor(0.0240)

There is only one set of vocab since there is only 1 target:

dls.vocab

['bacterial_leaf_blight', 'bacterial_leaf_streak', 'bacterial_panicle_blight', 'blast', 'brown_spot', 'dead_heart', 'downy_mildew', 'hispa', 'normal', 'tungro']

tst_files = get_image_files(path/'test_images')
tst_files.sort()

tst_dl = dls.test_dl(tst_files)
probs = learn.tta(dl=tst_dl)

# get the index (class) of the maximum prediction for each item
idxs = probs[0].argmax(dim=1)

# convert indexes to vocab strings
mapping = dict(enumerate(dls.vocab))

# add vocab strings to sample submission file and export to CSV
ss = pd.read_csv(path/'sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('subm2.csv', index=False)
!head subm2.csv

image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

This submission got the same Private score as the single-target model trained on the multi-output DataLoaders: 0.97580

Multi-target model on Multi-output `DataLoaders`

dls = DataBlock(
    blocks=(ImageBlock,CategoryBlock,CategoryBlock),
    n_inp=1,
    get_items=get_image_files,
    get_y = [parent_label,get_variety],
    splitter=RandomSplitter(0.2, seed=42),
    item_tfms=Resize(192, method='squish'),
    batch_tfms=aug_transforms(size=128, min_scale=0.75)
).dataloaders(trn_path)

dls.show_batch()

Jeremy picks the first ten activations of the model as the disease classes and the second ten as the variety classes. The disease_loss and variety_loss are defined accordingly:

def disease_loss(inp,disease,variety): return F.cross_entropy(inp[:,:10],disease)

def variety_loss(inp,disease,variety): return F.cross_entropy(inp[:,10:],variety)

The combined loss we are trying to minimize is the sum of the two target’s loss:

def combine_loss(inp,disease,variety): return disease_loss(inp,disease,variety)+variety_loss(inp,disease,variety)

The error rates are defined the same way (first 10 predictions for disease, second 10 for variety):

def disease_err(inp,disease,variety): return error_rate(inp[:,:10],disease)
def variety_err(inp,disease,variety): return error_rate(inp[:,10:],variety)

err_metrics = (disease_err,variety_err)

Jeremy also chooses to view the disease loss and variety loss separately.

all_metrics = err_metrics+(disease_loss,variety_loss)

n_out is set to 20 since we have two pairs of 10 classes that the model is trying to predict.

learn = vision_learner(dls, resnet34, loss_func=combine_loss, metrics=all_metrics, n_out=20).to_fp16()

Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to /root/.cache/torch/hub/checkpoints/resnet34-b627a593.pth

Jeremy mentioned that we might have to train this model for longer before it performs as well as the single-target model since we are asking to do more (predicts twice the number of targets). I’ll train and submit with 12 epochs first as a baseline.

learn.fine_tune(12, 0.01)

epoch	train_loss	valid_loss	disease_err	variety_err	disease_loss	variety_loss	time
0	3.191633	1.969826	0.419990	0.219125	1.272959	0.696868	01:19

epoch	train_loss	valid_loss	disease_err	variety_err	disease_loss	variety_loss	time
0	1.266333	0.694686	0.154733	0.059106	0.489899	0.204788	01:06
1	0.887806	0.605122	0.146564	0.053340	0.440036	0.165086	01:06
2	0.834215	1.014124	0.188371	0.090341	0.617206	0.396918	01:06
3	0.709637	0.634970	0.117732	0.058145	0.439491	0.195479	01:06
4	0.587873	0.580158	0.120135	0.045651	0.420883	0.159275	01:06
5	0.453230	0.404975	0.084575	0.031716	0.295974	0.109001	01:06
6	0.332355	0.315852	0.069678	0.017780	0.252630	0.063222	01:07
7	0.240017	0.276671	0.054781	0.025469	0.197499	0.079172	01:07
8	0.166121	0.182990	0.039885	0.012494	0.140265	0.042726	01:06
9	0.112039	0.182566	0.036040	0.011533	0.138646	0.043920	01:06
10	0.081297	0.177871	0.034599	0.008650	0.136665	0.041206	01:06
11	0.074365	0.173486	0.031716	0.008650	0.133155	0.040331	01:06

probs = learn.tta(dl=learn.dls.valid)

There are 20 predictions for each image.

probs[0].shape

torch.Size([2081, 20])

The first 10 predictions are for the disease label.

1 - (probs[0][:,:10].argmax(dim=1) == probs[1][0]).float().mean()

tensor(0.0279)

The TTA error rate on the validation set is slightly lower than the single-target models.

Just out of curiosity, I’ll also calculate the TTA error rate for variety:

1 - (probs[0][:,10:].argmax(dim=1) == probs[1][1]).float().mean()

tensor(0.0077)

The model is much more accurate at predicting the variety of rice.

I’ll submit TTA predictions on the test set:

tst_files = get_image_files(path/'test_images')
tst_files.sort()

tst_dl = dls.test_dl(tst_files)
probs = learn.tta(dl=tst_dl)

# get the index (class) of the maximum prediction for each item
idxs = probs[0][:,:10].argmax(dim=1)

# convert indexes to vocab strings
mapping = dict(enumerate(dls.vocab[0]))

# add vocab strings to sample submission file and export to CSV
ss = pd.read_csv(path/'sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('subm3.csv', index=False)
!head subm3.csv

image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

This model gave me the same Private score: 0.97580.

Finally, I’ll train the model for a few more epochs and see if that improves the score.

learn = vision_learner(dls, resnet34, loss_func=combine_loss, metrics=all_metrics, n_out=20).to_fp16()
learn.fine_tune(16, 0.01)

epoch	train_loss	valid_loss	disease_err	variety_err	disease_loss	variety_loss	time
0	3.186875	2.025326	0.409419	0.214320	1.350042	0.675284	01:06

epoch	train_loss	valid_loss	disease_err	variety_err	disease_loss	variety_loss	time
0	1.275486	0.722841	0.168188	0.068236	0.495478	0.227363	01:06
1	0.812811	0.582493	0.128784	0.053340	0.401935	0.180558	01:08
2	0.736832	0.655212	0.145603	0.064873	0.447887	0.207325	01:08
3	0.777283	1.089296	0.193176	0.119654	0.611161	0.478135	01:08
4	0.656918	0.736651	0.132148	0.063912	0.465955	0.270696	01:07
5	0.611457	0.523899	0.104277	0.044690	0.359421	0.164478	01:06
6	0.497228	0.408523	0.076886	0.039885	0.276221	0.132302	01:11
7	0.418236	0.349095	0.065834	0.026430	0.262863	0.086232	01:10
8	0.292306	0.334778	0.070639	0.022105	0.253223	0.081556	01:07
9	0.223652	0.276235	0.051418	0.014897	0.214205	0.062030	01:06
10	0.172232	0.222825	0.046612	0.013936	0.166589	0.056236	01:07
11	0.117083	0.198665	0.038443	0.008650	0.154245	0.044420	01:06
12	0.086082	0.196476	0.040365	0.010091	0.159040	0.037436	01:08
13	0.074184	0.185352	0.039404	0.006728	0.152728	0.032625	01:07
14	0.056641	0.174692	0.033638	0.006728	0.144675	0.030018	01:07
15	0.043815	0.177776	0.034118	0.007208	0.144719	0.033058	01:06

learn.recorder.plot_loss()

probs = learn.tta(dl=learn.dls.valid)
1 - (probs[0][:,:10].argmax(dim=1) == probs[1][0]).float().mean()

tensor(0.0245)

That’s a slightly better TTA validation error rate.

tst_files = get_image_files(path/'test_images')
tst_files.sort()

tst_dl = dls.test_dl(tst_files)
probs = learn.tta(dl=tst_dl)

# get the index (class) of the maximum prediction for each item
idxs = probs[0][:,:10].argmax(dim=1)

# convert indexes to vocab strings
mapping = dict(enumerate(dls.vocab[0]))

# add vocab strings to sample submission file and export to CSV
ss = pd.read_csv(path/'sample_submission.csv')
results = pd.Series(idxs.numpy(), name='idxs').map(mapping)
ss.label = results
ss.to_csv('subm4.csv', index=False)
!head subm4.csv

image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

The Private scored improved to 0.97811! That’s not insignificant. Training on multi-target, at least for a resnet34 on the Resize(192, method='squish') item transform and aug_transforms(size=128, min_scale=0.75) batch transform. Here is the summary of the four submissions from this notebook:

Final Thoughts

I am so glad that I ran this experiment since I am currently involved in a Kaggle competition where I was considering multi-target classification. There’s no certainty that it’ll improve my Private score in that situation, but it’s promising to see it improve the Paddy Disease Classification Private score here. Many thanks to Jeremy for introducing us to these engaging concepts.

I hope you enjoyed this blog post!

Background

Multi-output DataLoader

Single-target Model with Multi-output DataLoaders

Single-target Model with Single-output DataLoaders

Multi-target model on Multi-output DataLoaders

Final Thoughts

Multi-output `DataLoader`

Single-target Model with Multi-output `DataLoaders`

Single-target Model with Single-output `DataLoaders`

Multi-target model on Multi-output `DataLoaders`