!pip install -qq timm==0.6.13
!pip install kaggle -qq
import timm
timm.__version__
'0.6.13'
Vishal Bakshi
February 5, 2024
In the fastai course Part 1 Lesson 6 video Jeremy Howard walked through the notebooks First Steps: Road to the Top, Part 1 and Small models: Road to the Top, Part 2 where he builds increasingly accurate solutions to the Paddy Doctor: Paddy Disease Classification Kaggle Competition. In the video, Jeremy referenced a series of walkthrough videos that he made while working through the four-notebook series for this competition. I’m excited to watch these walkthroughs to better understand how to approach a Kaggle competition from the perspective of a former #1 Kaggle grandmaster.
In this blog post series, I’ll walk through the code Jeremy shared in each of the 6 Live Coding videos focused on this competition, submitting predictions to Kaggle along the way. My last two blog posts in this series reference Jeremy’s Scaling Up: Road to the Top, Part 3 notebook to improve my large model ensemble predictions. Here are the links to each of the blog posts in this series:
Obviously, “my” last approach is largely taken from Jeremy’s own live coding videos, but there are a few differences between our ensemble training:
Item | Jeremy | Vishal |
---|---|---|
Learning Rate | 0.01 for all architectures | 0.005 or 0.015 depending on the architecture |
# Epochs | 12 | 24 |
Architectures | convnext_large_in22k vit_large_patch16_224 swinv2_large_window12_192_22k swin_large_patch4_window7_224 |
convnext_large_in22k vit_large_patch16_224 swinv2_large_window12_192_22k |
Gradient Accumulation | Yes | Yes |
Batch Size | 32 | 16 |
Private score | 0.98732 | 0.98617 |
Public score | 0.98846 | 0.98654 |
Jeremy’s approach resulted in a Private score with an ~10% smaller error rate. In terms of rankings, Jeremy’s Private score ranks #8, while mine is tied from #27 to #58 on the leaderboard.
I’ll first replicate Jeremy’s approach, including architectures, learning rates, # of epochs, set_seed(42)
, and train
function to see if I get the same score. I would expect to do so. Once that’s confirmed, I will use his approach for the three architectures I chose and re-submit the predictions to see how that scores. If there’s still a difference, I can attribute it to Jeremy including the swin_large_patch4_window7_224 architecture (and multiple transforms for some of the models) in his ensemble.
from pathlib import Path
cred_path = Path("~/.kaggle/kaggle.json").expanduser()
if not cred_path.exists():
cred_path.parent.mkdir(exist_ok=True)
cred_path.write_text(creds)
cred_path.chmod(0o600)
import zipfile,kaggle
path = Path('paddy-disease-classification')
if not path.exists():
kaggle.api.competition_download_cli(str(path))
zipfile.ZipFile(f'{path}.zip').extractall(path)
from fastai.vision.all import *
set_seed(42)
models = {
'convnext_large_in22k': {
(Resize(res), (320,224)),
}, 'vit_large_patch16_224': {
(Resize(480, method='squish'), 224),
(Resize(res), 224),
}, 'swinv2_large_window12_192_22k': {
(Resize(480, method='squish'), 192),
(Resize(res), 192),
}, 'swin_large_patch4_window7_224': {
(Resize(res), 224),
}
}
def train(arch, size, item=Resize(480, method='squish'), accum=1, finetune=True, epochs=12):
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=item,
batch_tfms=aug_transforms(size=size, min_scale=0.75), bs=64//accum)
cbs = GradientAccumulation(64) if accum else []
learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
if finetune:
learn.fine_tune(epochs, 0.01)
return learn.tta(dl=dls.test_dl(tst_files))
else:
learn.unfreeze()
learn.fit_one_cycle(epochs, 0.01)
tta_res = []
for arch,details in models.items():
for item,size in details:
print('---',arch)
print(size)
print(item.name)
tta_res.append(train(arch, size, item=item, accum=2)) #, epochs=1))
gc.collect()
torch.cuda.empty_cache()
--- convnext_large_in22k
(320, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
Downloading: "https://dl.fbaipublicfiles.com/convnext/convnext_large_22k_224.pth" to /root/.cache/torch/hub/checkpoints/convnext_large_22k_224.pth
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.887948 | 0.558075 | 0.171072 | 01:50 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.355805 | 0.198265 | 0.062951 | 02:19 |
1 | 0.294646 | 0.232236 | 0.064392 | 02:19 |
2 | 0.279926 | 0.246197 | 0.068236 | 02:18 |
3 | 0.255699 | 0.214052 | 0.054781 | 02:18 |
4 | 0.185353 | 0.169206 | 0.050937 | 02:18 |
5 | 0.186070 | 0.143183 | 0.035560 | 02:18 |
6 | 0.085736 | 0.121303 | 0.030754 | 02:18 |
7 | 0.057243 | 0.090094 | 0.023546 | 02:18 |
8 | 0.055047 | 0.102438 | 0.024027 | 02:17 |
9 | 0.037102 | 0.081514 | 0.017780 | 02:18 |
10 | 0.031336 | 0.076584 | 0.019222 | 02:18 |
11 | 0.026582 | 0.077788 | 0.020183 | 02:17 |
--- vit_large_patch16_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.025479 | 0.599246 | 0.190293 | 01:56 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.372387 | 0.254584 | 0.078808 | 02:26 |
1 | 0.363945 | 0.267997 | 0.084575 | 02:26 |
2 | 0.337558 | 0.416980 | 0.118693 | 02:26 |
3 | 0.305778 | 0.237352 | 0.068717 | 02:26 |
4 | 0.205868 | 0.220364 | 0.052859 | 02:26 |
5 | 0.155062 | 0.132949 | 0.037001 | 02:26 |
6 | 0.131659 | 0.115785 | 0.029793 | 02:26 |
7 | 0.084275 | 0.113429 | 0.028352 | 02:26 |
8 | 0.054473 | 0.126284 | 0.028352 | 02:26 |
9 | 0.040826 | 0.095426 | 0.023066 | 02:26 |
10 | 0.030117 | 0.101836 | 0.022105 | 02:26 |
11 | 0.032172 | 0.097483 | 0.021624 | 02:26 |
--- vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.977947 | 0.495749 | 0.167227 | 01:53 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.445487 | 0.227239 | 0.076886 | 02:24 |
1 | 0.310319 | 0.217010 | 0.065353 | 02:24 |
2 | 0.346164 | 0.222110 | 0.071120 | 02:24 |
3 | 0.316973 | 0.220043 | 0.066314 | 02:24 |
4 | 0.201981 | 0.209637 | 0.057184 | 02:24 |
5 | 0.130500 | 0.139665 | 0.036521 | 02:24 |
6 | 0.142111 | 0.127187 | 0.030754 | 02:24 |
7 | 0.071605 | 0.089311 | 0.022105 | 02:24 |
8 | 0.048045 | 0.083216 | 0.020183 | 02:23 |
9 | 0.046874 | 0.084979 | 0.018260 | 02:23 |
10 | 0.024261 | 0.086005 | 0.019702 | 02:24 |
11 | 0.021223 | 0.083154 | 0.016819 | 02:23 |
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
/usr/local/lib/python3.9/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.929301 | 0.633476 | 0.173955 | 02:02 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.397221 | 0.207330 | 0.060548 | 02:26 |
1 | 0.349008 | 0.227384 | 0.068236 | 02:27 |
2 | 0.321012 | 0.355698 | 0.104277 | 02:27 |
3 | 0.280713 | 0.199645 | 0.058145 | 02:27 |
4 | 0.228984 | 0.219441 | 0.061028 | 02:26 |
5 | 0.154743 | 0.159890 | 0.039885 | 02:28 |
6 | 0.133811 | 0.156821 | 0.040365 | 02:27 |
7 | 0.082750 | 0.137658 | 0.032196 | 02:27 |
8 | 0.069910 | 0.132426 | 0.029793 | 02:27 |
9 | 0.052760 | 0.111611 | 0.022585 | 02:26 |
10 | 0.039415 | 0.115199 | 0.022585 | 02:27 |
11 | 0.027980 | 0.115816 | 0.023066 | 02:26 |
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.937894 | 0.522241 | 0.169630 | 02:04 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.460170 | 0.213483 | 0.073042 | 02:29 |
1 | 0.352201 | 0.178675 | 0.054301 | 02:29 |
2 | 0.386350 | 0.334553 | 0.097549 | 02:28 |
3 | 0.299634 | 0.164248 | 0.046612 | 02:29 |
4 | 0.228867 | 0.126158 | 0.035079 | 02:29 |
5 | 0.178476 | 0.138584 | 0.042768 | 02:29 |
6 | 0.154566 | 0.148524 | 0.039885 | 02:28 |
7 | 0.094369 | 0.078149 | 0.020663 | 02:29 |
8 | 0.066650 | 0.069993 | 0.019222 | 02:29 |
9 | 0.049904 | 0.061477 | 0.017299 | 02:29 |
10 | 0.037704 | 0.060764 | 0.016338 | 02:29 |
11 | 0.033226 | 0.061885 | 0.015858 | 02:29 |
--- swin_large_patch4_window7_224
224
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_large_patch4_window7_224_22kto1k.pth" to /root/.cache/torch/hub/checkpoints/swin_large_patch4_window7_224_22kto1k.pth
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.930659 | 0.492810 | 0.154253 | 01:41 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.422924 | 0.213063 | 0.072561 | 02:02 |
1 | 0.369444 | 0.222754 | 0.066795 | 02:02 |
2 | 0.338899 | 0.212781 | 0.057665 | 02:03 |
3 | 0.308362 | 0.159222 | 0.046612 | 02:03 |
4 | 0.214941 | 0.142208 | 0.037001 | 02:02 |
5 | 0.155058 | 0.139699 | 0.032196 | 02:02 |
6 | 0.161482 | 0.116061 | 0.030754 | 02:02 |
7 | 0.098805 | 0.080427 | 0.022105 | 02:03 |
8 | 0.071636 | 0.073006 | 0.020183 | 02:02 |
9 | 0.056668 | 0.073751 | 0.018260 | 02:02 |
10 | 0.044765 | 0.064573 | 0.015858 | 02:02 |
11 | 0.040009 | 0.063520 | 0.015858 | 02:02 |
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
This submission resulted in the following Kaggle score:
My Public score error rate decreased by 20%, but my Private score did not budge.
Now that I have successfully recreated Jeremy’s submission (in the sense that the models ran without error and the submission gave a reasonable score in Kaggle), I’ll now apply the same hyperparameters and functions he used for his architectures and transforms to the ones I chose for my large ensemble. The goal is to see if using his code results in a better score than when I used my code.
tta_res = []
for arch,details in models.items():
for item,size in details:
print('---',arch)
print(size)
print(item.name)
tta_res.append(train(arch, size, item=item, accum=2)) #, epochs=1))
gc.collect()
torch.cuda.empty_cache()
--- convnext_large_in22k
(288, 224)
Resize -- {'size': (480, 640), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.856573 | 0.475021 | 0.147525 | 01:39 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.383883 | 0.193702 | 0.055262 | 02:07 |
1 | 0.291577 | 0.189317 | 0.055262 | 02:07 |
2 | 0.265584 | 0.190596 | 0.051898 | 02:07 |
3 | 0.260673 | 0.216098 | 0.059106 | 02:07 |
4 | 0.188353 | 0.159554 | 0.047093 | 02:06 |
5 | 0.159173 | 0.157409 | 0.039404 | 02:07 |
6 | 0.100692 | 0.130478 | 0.029793 | 02:06 |
7 | 0.060365 | 0.107081 | 0.025469 | 02:07 |
8 | 0.050812 | 0.080841 | 0.023066 | 02:07 |
9 | 0.035694 | 0.084650 | 0.022105 | 02:07 |
10 | 0.032912 | 0.075940 | 0.016819 | 02:06 |
11 | 0.024196 | 0.081224 | 0.018741 | 02:07 |
--- vit_large_patch16_224
224
Resize -- {'size': (480, 480), 'method': 'crop', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 1.040115 | 0.719582 | 0.226814 | 01:53 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.390209 | 0.215410 | 0.070159 | 02:24 |
1 | 0.400067 | 0.283184 | 0.092744 | 02:24 |
2 | 0.341151 | 0.359277 | 0.098030 | 02:25 |
3 | 0.357469 | 0.291627 | 0.096588 | 02:24 |
4 | 0.237050 | 0.233321 | 0.064873 | 02:24 |
5 | 0.162601 | 0.153232 | 0.039885 | 02:24 |
6 | 0.116374 | 0.129873 | 0.034599 | 02:24 |
7 | 0.097705 | 0.106423 | 0.024507 | 02:24 |
8 | 0.062052 | 0.120935 | 0.026430 | 02:24 |
9 | 0.044538 | 0.098947 | 0.023066 | 02:24 |
10 | 0.029300 | 0.100037 | 0.020663 | 02:23 |
11 | 0.026877 | 0.097046 | 0.020663 | 02:24 |
--- swinv2_large_window12_192_22k
192
Resize -- {'size': (480, 480), 'method': 'squish', 'pad_mode': 'reflection', 'resamples': (<Resampling.BILINEAR: 2>, <Resampling.NEAREST: 0>), 'p': 1.0}
/usr/local/lib/python3.9/dist-packages/torch/functional.py:478: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:2894.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
Downloading: "https://github.com/SwinTransformer/storage/releases/download/v2.0.0/swinv2_large_patch4_window12_192_22k.pth" to /root/.cache/torch/hub/checkpoints/swinv2_large_patch4_window12_192_22k.pth
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.900081 | 0.538801 | 0.184527 | 02:01 |
epoch | train_loss | valid_loss | error_rate | time |
---|---|---|---|---|
0 | 0.462490 | 0.211737 | 0.063912 | 02:26 |
1 | 0.331918 | 0.281848 | 0.090822 | 02:26 |
2 | 0.376580 | 0.291321 | 0.093705 | 02:26 |
3 | 0.255427 | 0.163525 | 0.045651 | 02:26 |
4 | 0.237116 | 0.193330 | 0.056223 | 02:26 |
5 | 0.153437 | 0.123250 | 0.040365 | 02:26 |
6 | 0.115951 | 0.133760 | 0.034118 | 02:25 |
7 | 0.080223 | 0.078580 | 0.023066 | 02:25 |
8 | 0.060698 | 0.083489 | 0.020663 | 02:26 |
9 | 0.056002 | 0.078566 | 0.018260 | 02:26 |
10 | 0.035586 | 0.075723 | 0.017299 | 02:26 |
11 | 0.033601 | 0.074444 | 0.016819 | 02:26 |
I’ll do three more Kaggle submissions:
torch.Size([3469, 10])
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
tta_prs = first(zip(*tta_res))
avg_pr = torch.stack(tta_prs).mean(0)
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
# weigh the vit preds more
tta_res = load_pickle('tta_res2.pkl')
tta_res += 2 * [tta_res[1]]
for i in range(len(tta_res)):
print(len(tta_res[i][0]))
3469
3469
3469
3469
3469
tta_prs = first(zip(*tta_res))
avg_pr = torch.stack(tta_prs).mean(0)
dls = ImageDataLoaders.from_folder(trn_path, valid_pct=0.2, item_tfms=Resize(480, method='squish'),
batch_tfms=aug_transforms(size=224, min_scale=0.75))
idxs = avg_pr.argmax(dim=1)
vocab = np.array(dls.vocab)
ss = pd.read_csv(path/'sample_submission.csv')
ss['label'] = vocab[idxs]
ss.to_csv('subm.csv', index=False)
!head subm.csv
image_id,label
200001.jpg,hispa
200002.jpg,normal
200003.jpg,blast
200004.jpg,blast
200005.jpg,blast
200006.jpg,brown_spot
200007.jpg,dead_heart
200008.jpg,brown_spot
200009.jpg,hispa
Here are the Kaggle scores for those three submissions:
Description | Private score | Public score |
---|---|---|
All three model predictions weighted equally | 0.98617 | 0.98769 |
convnext weighted more | 0.98617 | 0.98539 |
vit weighted more | 0.98502 | 0.98654 |
The best Private score amongst these three submissions was tied with the previous best of 0.98617.
The best Public score still belongs to the submission replicating Jeremy’s approach directly (0.98923).
Here are the comprehensive Kaggle scoring results for this competition:
Submission | Description | Private Score | Public Score |
---|---|---|---|
1 | initial submission file after creating a quick small model following Jeremy Howard’s walkthrough video. | 0.13709 | 0.12418 |
2 | initial submission using convnext small 2 epochs fine-tuned sorted file list | 0.94124 | 0.92541 |
3 | squish convnext small 12 epoch ft tta | 0.98156 | 0.98308 |
4 | ensemble small 12 epoch ft tta | 0.98617* | 0.98423 |
5 | swinv2 convnext vit large ensemble 12 epoch ft tta | 0.97811 | 0.98039 |
6 | swinv2 convnext vit large ensemble 24 epoch ft tta | 0.98502 | 0.98539 |
7 | swinv2 (3x convnext) vit large ensemble 24 epoch ft tta | 0.98387 | 0.98423 |
8 | (3x swinv2) convnext vit large ensemble 24 epoch ft tta | 0.98156 | 0.985 |
9 | swinv2 convnext (3x vit) large ensemble 24 epoch ft tta | 0.98617* | 0.98462 |
10 | swinv2 large 24 epoch ft tta | 0.98271 | 0.98269 |
11 | convnext large 24 epoch ft tta | 0.98502 | 0.98269 |
12 | vit large 24 epoch ft tta | 0.97811 | 0.98231 |
13 | swinv2 convnext vit large ensemble 24 epoch ft tta lr_find | 0.98387 | 0.98577 |
14 | swinv2 convnext (3x vit) large ensemble 24 epoch ft tta lr_find | 0.98617* | 0.98654 |
15 | Following Jeremy Howard’s “Scaling Up: Road to the Top, Part 3” Notebook | 0.98617* | 0.98923** |
16 | convnext swinv2 vit large ft 12 epoch tta road to the top | 0.98617* | 0.98769 |
17 | (3 x convnext) swinv2 vit large ft 12 epoch tta road to the top | 0.98617* | 0.98539 |
* largest private score (0.98617)
** largest public score (0.98923)
I really really enjoyed working through the 6-part live coding series which resulted in this 8-part blog post mini-series. I learned so much across a wide variety of topics. It was also required a lot of patience and tenacity. I ran into endless errors or issues using Kaggle and Google Colab running the trainings for the first 7 blog posts. For some unknown reason, when I was using Kaggle (whether it was in Chrome or Firefox, Incognito/Private Window and otherwise) the tab kept crashing with an “Aw, snap” error (Chrome) or “Gah” error (Firefox). Each time, I lost progress and had to re-run the model training, sometimes losing 4-5 hours of progress because of this. In Google Colab, initially it was smooth sailing until I ran out of compute units (which always show 0 anyways in the free tier). I was debating whether to purchase 100 Google Colab compute units for 10 dollars. I decided instead to upgrade my Paperspace subscription to Pro for 8 dollars/month and thus got access to faster GPUs for “free”. However, that didn’t come without a catch! You can only run free tier GPUs for 6 hours before Paperspace automatically shuts it down. Fortunately, my model training runs in this notebook only took about 4-ish hours, so I escaped unscathed.
A few takeaways:
lr_find
because common problems in vision all require a similar learning rate. It didn’t matter whether I was using large or small versions of convnext, swinv2 or vit architectures, for 12 or 24 epochs. A learning rate of 0.01 for all scenarios performed the best.As always, I hope you enjoyed reading this blog post series!