Paddy Doctor Kaggle Competition - Part 8

deep learning
fastai
kaggle competition
paddy doctor
python
In this notebook I apply to my large ensemble Jeremy Howard’s approach in the “Scaling Up - Road to the Top, Part 3” notebook.
Author

Vishal Bakshi

Published

February 5, 2024

Background

In the fastai course Part 1 Lesson 6 video Jeremy Howard walked through the notebooks First Steps: Road to the Top, Part 1 and Small models: Road to the Top, Part 2 where he builds increasingly accurate solutions to the Paddy Doctor: Paddy Disease Classification Kaggle Competition. In the video, Jeremy referenced a series of walkthrough videos that he made while working through the four-notebook series for this competition. I’m excited to watch these walkthroughs to better understand how to approach a Kaggle competition from the perspective of a former #1 Kaggle grandmaster.

In this blog post series, I’ll walk through the code Jeremy shared in each of the 6 Live Coding videos focused on this competition, submitting predictions to Kaggle along the way. My last two blog posts in this series reference Jeremy’s Scaling Up: Road to the Top, Part 3 notebook to improve my large model ensemble predictions. Here are the links to each of the blog posts in this series:

Comparing Jeremy’s Approach to Mine

Obviously, “my” last approach is largely taken from Jeremy’s own live coding videos, but there are a few differences between our ensemble training:

Item Jeremy Vishal
Learning Rate 0.01 for all architectures 0.005 or 0.015 depending on the architecture
# Epochs 12 24
Architectures convnext_large_in22k
vit_large_patch16_224
swinv2_large_window12_192_22k
swin_large_patch4_window7_224
convnext_large_in22k
vit_large_patch16_224
swinv2_large_window12_192_22k
Gradient Accumulation Yes Yes
Batch Size 32 16
Private score 0.98732 0.98617
Public score 0.98846 0.98654

Jeremy’s approach resulted in a Private score with an ~10% smaller error rate. In terms of rankings, Jeremy’s Private score ranks #8, while mine is tied from #27 to #58 on the leaderboard.

Replicating Jeremy’s Approach

I’ll first replicate Jeremy’s approach, including architectures, learning rates, # of epochs, set_seed(42), and train function to see if I get the same score. I would expect to do so. Once that’s confirmed, I will use his approach for the three architectures I chose and re-submit the predictions to see how that scores. If there’s still a difference, I can attribute it to Jeremy including the swin_large_patch4_window7_224 architecture (and multiple transforms for some of the models) in his ensemble.

!pip install -qq timm==0.6.13
!pip install kaggle -qq
import timm
timm.__version__
'0.6.13'
from pathlib import Path

cred_path = Path("~/.kaggle/kaggle.json").expanduser()
if not cred_path.exists():
  cred_path.parent.mkdir(exist_ok=True)
  cred_path.write_text(creds)
  cred_path.chmod(0o600)

import zipfile,kaggle

path = Path('paddy-disease-classification')
if not path.exists():
  kaggle.api.competition_download_cli(str(path))
  zipfile.ZipFile(f'{path}.zip').extractall(path)

from fastai.vision.all import *
set_seed(42)
import gc
tst_files = get_image_files(path/'test_images').sorted()
trn_path = path/'train_images'
res = 640,480
models = {
    'convnext_large_in22k': {
        (Resize(res), (320,224)),
    }, 'vit_large_patch16_224': {
        (Resize(480, method='squish'), 224),
        (Resize(res), 224),
    }, 'swinv2_large_window12_192_22k': {
        (Resize(480, method='squish'), 192),
        (Resize(res), 192),
    }, 'swin_large_patch4_window7_224': {
        (Resize(res), 224),
    }
}
def train(arch, size, item=Resize(480, method='squish'), accum=1, finetune=True, epochs=12):
    dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=item,
        batch_tfms=aug_transforms(size=size, min_scale=0.75), bs=64//accum)
    cbs = GradientAccumulation(64) if accum else []
    learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
    if finetune:
        learn.fine_tune(epochs, 0.01)
        return learn.tta(dl=dls.test_dl(tst_files))
    else:
        learn.unfreeze()
        learn.fit_one_cycle(epochs, 0.01)
tta_res = []

for arch,details in models.items():
    for item,size in details:
        print('---',arch)
        print(size)
        print(item.name)
        tta_res.append(train(arch, size, item=item, accum=2)) #, epochs=1))
        gc.collect()
        torch.cuda.empty_cache()
--- convnext_large_in22k
(320, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth
epoch train_loss valid_loss error_rate time
0 0.887948 0.558075 0.171072 01:50
epoch train_loss valid_loss error_rate time
0 0.355805 0.198265 0.062951 02:19
1 0.294646 0.232236 0.064392 02:19
2 0.279926 0.246197 0.068236 02:18
3 0.255699 0.214052 0.054781 02:18
4 0.185353 0.169206 0.050937 02:18
5 0.186070 0.143183 0.035560 02:18
6 0.085736 0.121303 0.030754 02:18
7 0.057243 0.090094 0.023546 02:18
8 0.055047 0.102438 0.024027 02:17
9 0.037102 0.081514 0.017780 02:18
10 0.031336 0.076584 0.019222 02:18
11 0.026582 0.077788 0.020183 02:17
--- vit_large_patch16_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch train_loss valid_loss error_rate time
0 1.025479 0.599246 0.190293 01:56
epoch train_loss valid_loss error_rate time
0 0.372387 0.254584 0.078808 02:26
1 0.363945 0.267997 0.084575 02:26
2 0.337558 0.416980 0.118693 02:26
3 0.305778 0.237352 0.068717 02:26
4 0.205868 0.220364 0.052859 02:26
5 0.155062 0.132949 0.037001 02:26
6 0.131659 0.115785 0.029793 02:26
7 0.084275 0.113429 0.028352 02:26
8 0.054473 0.126284 0.028352 02:26
9 0.040826 0.095426 0.023066 02:26
10 0.030117 0.101836 0.022105 02:26
11 0.032172 0.097483 0.021624 02:26
--- vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch train_loss valid_loss error_rate time
0 0.977947 0.495749 0.167227 01:53
epoch train_loss valid_loss error_rate time
0 0.445487 0.227239 0.076886 02:24
1 0.310319 0.217010 0.065353 02:24
2 0.346164 0.222110 0.071120 02:24
3 0.316973 0.220043 0.066314 02:24
4 0.201981 0.209637 0.057184 02:24
5 0.130500 0.139665 0.036521 02:24
6 0.142111 0.127187 0.030754 02:24
7 0.071605 0.089311 0.022105 02:24
8 0.048045 0.083216 0.020183 02:23
9 0.046874 0.084979 0.018260 02:23
10 0.024261 0.086005 0.019702 02:24
11 0.021223 0.083154 0.016819 02:23
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
/usr/local/lib/python3.9/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth
epoch train_loss valid_loss error_rate time
0 0.929301 0.633476 0.173955 02:02
epoch train_loss valid_loss error_rate time
0 0.397221 0.207330 0.060548 02:26
1 0.349008 0.227384 0.068236 02:27
2 0.321012 0.355698 0.104277 02:27
3 0.280713 0.199645 0.058145 02:27
4 0.228984 0.219441 0.061028 02:26
5 0.154743 0.159890 0.039885 02:28
6 0.133811 0.156821 0.040365 02:27
7 0.082750 0.137658 0.032196 02:27
8 0.069910 0.132426 0.029793 02:27
9 0.052760 0.111611 0.022585 02:26
10 0.039415 0.115199 0.022585 02:27
11 0.027980 0.115816 0.023066 02:26
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch train_loss valid_loss error_rate time
0 0.937894 0.522241 0.169630 02:04
epoch train_loss valid_loss error_rate time
0 0.460170 0.213483 0.073042 02:29
1 0.352201 0.178675 0.054301 02:29
2 0.386350 0.334553 0.097549 02:28
3 0.299634 0.164248 0.046612 02:29
4 0.228867 0.126158 0.035079 02:29
5 0.178476 0.138584 0.042768 02:29
6 0.154566 0.148524 0.039885 02:28
7 0.094369 0.078149 0.020663 02:29
8 0.066650 0.069993 0.019222 02:29
9 0.049904 0.061477 0.017299 02:29
10 0.037704 0.060764 0.016338 02:29
11 0.033226 0.061885 0.015858 02:29
--- swin_large_patch4_window7_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22kto1k.pth" to /root/.cache/torch/hub/checkpoints/swin_large_patch4_window7_224_22kto1k.pth
epoch train_loss valid_loss error_rate time
0 0.930659 0.492810 0.154253 01:41
epoch train_loss valid_loss error_rate time
0 0.422924 0.213063 0.072561 02:02
1 0.369444 0.222754 0.066795 02:02
2 0.338899 0.212781 0.057665 02:03
3 0.308362 0.159222 0.046612 02:03
4 0.214941 0.142208 0.037001 02:02
5 0.155058 0.139699 0.032196 02:02
6 0.161482 0.116061 0.030754 02:02
7 0.098805 0.080427 0.022105 02:03
8 0.071636 0.073006 0.020183 02:02
9 0.056668 0.073751 0.018260 02:02
10 0.044765 0.064573 0.015858 02:02
11 0.040009 0.063520 0.015858 02:02
#save_pickle('tta_res.pkl', tta_res)
tta_res = load_pickle("tta_res.pkl")
for i in range(len(tta_res)):
    print(len(tta_res[i][0]))
3469
3469
3469
3469
3469
3469
tta_prs = first(zip(*tta_res))
# double weight the vit predictions
tta_prs += tta_prs[1:3]
avg_pr = torch.stack(tta_prs).mean(0)
avg_pr.shape
torch.Size([3469, 10])
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=224, min_scale=0.75))
idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
!head subm.csv
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

This submission resulted in the following Kaggle score:

  • Private score: 0.98617
  • Public score: 0.98923 (new best)

My Public score error rate decreased by 20%, but my Private score did not budge.

Using Jeremy’s Approach for My Ensemble

Now that I have successfully recreated Jeremy’s submission (in the sense that the models ran without error and the submission gave a reasonable score in Kaggle), I’ll now apply the same hyperparameters and functions he used for his architectures and transforms to the ones I chose for my large ensemble. The goal is to see if using his code results in a better score than when I used my code.

models = {
    'convnext_large_in22k': {
        (Resize(res), (288,224)),
    }, 'vit_large_patch16_224': {
        (Resize(480), 224),
    }, 'swinv2_large_window12_192_22k': {
        (Resize(480, method='squish'), 192)
    }
}
tta_res = []

for arch,details in models.items():
    for item,size in details:
        print('---',arch)
        print(size)
        print(item.name)
        tta_res.append(train(arch, size, item=item, accum=2)) #, epochs=1))
        gc.collect()
        torch.cuda.empty_cache()
--- convnext_large_in22k
(288, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch train_loss valid_loss error_rate time
0 0.856573 0.475021 0.147525 01:39
epoch train_loss valid_loss error_rate time
0 0.383883 0.193702 0.055262 02:07
1 0.291577 0.189317 0.055262 02:07
2 0.265584 0.190596 0.051898 02:07
3 0.260673 0.216098 0.059106 02:07
4 0.188353 0.159554 0.047093 02:06
5 0.159173 0.157409 0.039404 02:07
6 0.100692 0.130478 0.029793 02:06
7 0.060365 0.107081 0.025469 02:07
8 0.050812 0.080841 0.023066 02:07
9 0.035694 0.084650 0.022105 02:07
10 0.032912 0.075940 0.016819 02:06
11 0.024196 0.081224 0.018741 02:07
--- vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch train_loss valid_loss error_rate time
0 1.040115 0.719582 0.226814 01:53
epoch train_loss valid_loss error_rate time
0 0.390209 0.215410 0.070159 02:24
1 0.400067 0.283184 0.092744 02:24
2 0.341151 0.359277 0.098030 02:25
3 0.357469 0.291627 0.096588 02:24
4 0.237050 0.233321 0.064873 02:24
5 0.162601 0.153232 0.039885 02:24
6 0.116374 0.129873 0.034599 02:24
7 0.097705 0.106423 0.024507 02:24
8 0.062052 0.120935 0.026430 02:24
9 0.044538 0.098947 0.023066 02:24
10 0.029300 0.100037 0.020663 02:23
11 0.026877 0.097046 0.020663 02:24
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
/usr/local/lib/python3.9/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2894.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth
epoch train_loss valid_loss error_rate time
0 0.900081 0.538801 0.184527 02:01
epoch train_loss valid_loss error_rate time
0 0.462490 0.211737 0.063912 02:26
1 0.331918 0.281848 0.090822 02:26
2 0.376580 0.291321 0.093705 02:26
3 0.255427 0.163525 0.045651 02:26
4 0.237116 0.193330 0.056223 02:26
5 0.153437 0.123250 0.040365 02:26
6 0.115951 0.133760 0.034118 02:25
7 0.080223 0.078580 0.023066 02:25
8 0.060698 0.083489 0.020663 02:26
9 0.056002 0.078566 0.018260 02:26
10 0.035586 0.075723 0.017299 02:26
11 0.033601 0.074444 0.016819 02:26
len(tta_res), len(tta_res[0][0]), len(tta_res[1][0]), len(tta_res[2][0])
(3, 3469, 3469, 3469)
# save_pickle('tta_res2.pkl', tta_res)
tta_res = load_pickle('tta_res2.pkl')

I’ll do three more Kaggle submissions:

  • All three model predictions weighted equally.
  • convnext model weighted more (because it had the lowest final training epoch validation error rate)
  • vit model weighted more (because the smaller version previously had the best TTA error rate, and it’s also the one Jeremy weighted more)
tta_prs = first(zip(*tta_res))
avg_pr = torch.stack(tta_prs).mean(0)
avg_pr.shape
torch.Size([3469, 10])
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=224, min_scale=0.75))

idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
!head subm.csv
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
# weigh the convnext preds more
tta_res += 2 * [tta_res[0]]
for i in range(len(tta_res)):
    print(len(tta_res[i][0]))
3469
3469
3469
3469
3469
tta_prs = first(zip(*tta_res))
avg_pr = torch.stack(tta_prs).mean(0)

dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=224, min_scale=0.75))

idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
!head subm.csv
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
# weigh the vit preds more
tta_res = load_pickle('tta_res2.pkl')
tta_res += 2 * [tta_res[1]]

for i in range(len(tta_res)):
    print(len(tta_res[i][0]))
3469
3469
3469
3469
3469
tta_prs = first(zip(*tta_res))
avg_pr = torch.stack(tta_prs).mean(0)

dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
    batch_tfms=aug_transforms(size=224, min_scale=0.75))

idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
!head subm.csv
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa

Here are the Kaggle scores for those three submissions:

Description Private score Public score
All three model predictions weighted equally 0.98617 0.98769
convnext weighted more 0.98617 0.98539
vit weighted more 0.98502 0.98654

The best Private score amongst these three submissions was tied with the previous best of 0.98617.

The best Public score still belongs to the submission replicating Jeremy’s approach directly (0.98923).

Here are the comprehensive Kaggle scoring results for this competition:

Submission Description Private Score Public Score
1 initial submission file after creating a quick small model following Jeremy Howard’s walkthrough video. 0.13709 0.12418
2 initial submission using convnext small 2 epochs fine-tuned sorted file list 0.94124 0.92541
3 squish convnext small 12 epoch ft tta 0.98156 0.98308
4 ensemble small 12 epoch ft tta 0.98617* 0.98423
5 swinv2 convnext vit large ensemble 12 epoch ft tta 0.97811 0.98039
6 swinv2 convnext vit large ensemble 24 epoch ft tta 0.98502 0.98539
7 swinv2 (3x convnext) vit large ensemble 24 epoch ft tta 0.98387 0.98423
8 (3x swinv2) convnext vit large ensemble 24 epoch ft tta 0.98156 0.985
9 swinv2 convnext (3x vit) large ensemble 24 epoch ft tta 0.98617* 0.98462
10 swinv2 large 24 epoch ft tta 0.98271 0.98269
11 convnext large 24 epoch ft tta 0.98502 0.98269
12 vit large 24 epoch ft tta 0.97811 0.98231
13 swinv2 convnext vit large ensemble 24 epoch ft tta lr_find 0.98387 0.98577
14 swinv2 convnext (3x vit) large ensemble 24 epoch ft tta lr_find 0.98617* 0.98654
15 Following Jeremy Howard’s “Scaling Up: Road to the Top, Part 3” Notebook 0.98617* 0.98923**
16 convnext swinv2 vit large ft 12 epoch tta road to the top 0.98617* 0.98769
17 (3 x convnext) swinv2 vit large ft 12 epoch tta road to the top 0.98617* 0.98539

* largest private score (0.98617)

** largest public score (0.98923)

Final Thoughts

I really really enjoyed working through the 6-part live coding series which resulted in this 8-part blog post mini-series. I learned so much across a wide variety of topics. It was also required a lot of patience and tenacity. I ran into endless errors or issues using Kaggle and Google Colab running the trainings for the first 7 blog posts. For some unknown reason, when I was using Kaggle (whether it was in Chrome or Firefox, Incognito/Private Window and otherwise) the tab kept crashing with an “Aw, snap” error (Chrome) or “Gah” error (Firefox). Each time, I lost progress and had to re-run the model training, sometimes losing 4-5 hours of progress because of this. In Google Colab, initially it was smooth sailing until I ran out of compute units (which always show 0 anyways in the free tier). I was debating whether to purchase 100 Google Colab compute units for 10 dollars. I decided instead to upgrade my Paperspace subscription to Pro for 8 dollars/month and thus got access to faster GPUs for “free”. However, that didn’t come without a catch! You can only run free tier GPUs for 6 hours before Paperspace automatically shuts it down. Fortunately, my model training runs in this notebook only took about 4-ish hours, so I escaped unscathed.

A few takeaways:

  • I now understand what Jeremy meant when he said that you don’t really need to use lr_find because common problems in vision all require a similar learning rate. It didn’t matter whether I was using large or small versions of convnext, swinv2 or vit architectures, for 12 or 24 epochs. A learning rate of 0.01 for all scenarios performed the best.
  • All three of the architectures I used are pretty stable. There is variance in the final epoch validation error rate, but even after 15 different submissions, with different combinations of architectures, epochs and learning rates, the Kaggle maximum score didn’t break 0.98617.
  • Kaggle competitions are thrilling even when I submit scores after the competition is closed. I enjoyed trying to beat my previous score (and attempting to beat Jeremy’s score—with his own code and approach). Each time I submitted a CSV, I was excited to see the results. I can imagine the thrill when the competition is live. It must be so stressful as well! I am looking forward to competing in a live competition in 2024.
  • It’s important to both pace myself and be consistent. There were days where I couldn’t get nything accomplished on this project. There were also days where I watched and took notes on an entire live coding video from start to finish, and there were days in between. That’s fine. It happens! What’s important is to not give up just because one particular week (or month) is not producing much output. I also found that my persistence was bolstered by simply logging into Kaggle everyday, and keeping my streak going. Even if all I did was login to Kaggle. I heard someone say on a podcast or Instagram/TikTok video that before they got in shape, all they did was go to the gym and stay there for 5 minutes every day then come back home for 6 weeks. Just that practice solidified their consistency. I’m proud to say that as part of this project, I am on a 70 day Kaggle login streak! Here’s to continuing that streak throughout 2024.

A screenshot showing my 70-day login streak in Kaggle

As always, I hope you enjoyed reading this blog post series!