import torch, numpy as np, pandas as pd, torch.nn.functional as F
from fastai.data.transforms import RandomSplitter
=140)
np.set_printoptions(linewidth=140, sci_mode=False, edgeitems=7)
torch.set_printoptions(linewidth'display.width', 140) pd.set_option(
Plotting Losses and Accuracy for 100 Deep Neural Net Training Runs
Background
In this blog post I’ll modify the neural net training loop example in Jeremy Howard’s Lesson 5 notebook Linear model and neural net from scratch to plot training loss, validation loss, and accuracy across a number of training runs. I’ll run 100 trainings for the neural net, record the losses and accuracy, and then plot them to see how they vary by epoch and by training loop.
I am also inspired by (and learned from) this forum post by a fastai community member (sign-in required) where they plotted losses, gradients, parameters and accuracy for various training runs that included or excluded params.grad_zero()
and L2 regularization. They found that for a simple linear model, zeroing the gradients leads to more stable training, smaller coefficients and higher accuracy than letting gradients accumulate each epoch.
Plan of Attack
I want to record values at the end of each epoch, separated for each training run. I’ll create a recorder
DataFrame
where I store this data. Here’s pseudocode for how the recording will take place, referencing functions defined in Jeremy’s notebook and logic used by in the fastai forum post to collect losses and accuracy:
# code to clean data
...
# code to create training and validation xs and ys
...
# new function to run multiple trainings
def training_run(runs=100):
# initialize recorder object
= pd.DataFrame(columns=["run", "epoch", "trn_loss", "val_loss", "acc"])
recorder for run in range(runs):
# get lists of losses and accuracy
= train_model(...)
tl, vl, a # create list of run and epoch values
= [run] * len(tl)
r = [i for i in range(len(tl))]
e # append new data to recorder DataFrame
= pd.DataFrame(data={"run": r, "epoch": e, "trn_loss": tl, "val_loss": vl, "acc": a})
row = pd.concat(recorder, row)
recorder return recorder
# modify existing function
def train_model(...):
= [], [], []
tl, vl, a for i in range(epochs):
= one_epoch(...)
trn_loss, val_loss, acc
tl.append(trn_loss)
vl.append(val_loss)
a.append(acc)return tl, vl, a
# modify existing function
def one_epoch(...):
= calc_loss(...)
trn_loss = calc_loss(...)
val_loss
trn_loss.backward()with torch.no_grad(): update_coeffs(...)
= calc_acc(...)
acc return trn_loss, val_loss, acc
# use existing function to calculate predictions
def calc_preds(...): ...
# use existing function to calculate loss
def calc_loss(...): ...
# use existing function to step the weights
def update_coeffs(...): ...
# use existing function to calculate accuracy
def calc_acc(...): ...
# use existing function to initiate weights
def init_coeffs(...): ...
With the pseudocode sketched out, I’ll start building out each function next.
Building the Functions
Initialize Coefficients
def init_coeffs(n_coeff):
= [10, 10] # <-- set this to the size of each hidden layer you want
hiddens = [n_coeff] + hiddens + [1]
sizes = len(sizes)
n = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]
layers = [(torch.rand(1)[0]-0.5)*0.1 for i in range(n-1)]
consts for l in layers+consts: l.requires_grad_()
return layers,consts
Calculate Predictions
def calc_preds(coeffs, indeps):
= coeffs
layers,consts = len(layers)
n = indeps
res for i,l in enumerate(layers):
= res@l + consts[i]
res if i!=n-1: res = F.relu(res)
return torch.sigmoid(res)
Calculate Loss
def calc_loss(coeffs, indeps, deps): return torch.abs(calc_preds(coeffs, indeps)-deps).mean()
Update the Coefficients
def update_coeffs(coeffs, lr):
= coeffs
layers,consts for layer in layers+consts:
* lr)
layer.sub_(layer.grad layer.grad.zero_()
Calculate Accuracy
def calc_acc(coeffs): return (val_dep.bool()==(calc_preds(coeffs, val_indep)>0.5)).float().mean()
Train One Epoch
def one_epoch(coeffs, lr):
= calc_loss(coeffs, trn_indep, trn_dep)
trn_loss
trn_loss.backward()with torch.no_grad():
= calc_loss(coeffs, val_indep, val_dep)
val_loss
update_coeffs(coeffs, lr)= calc_acc(coeffs)
acc return trn_loss, val_loss, acc
Train a Model
def train_model(epochs, lr, n_coeff, is_seed=True):
if is_seed: torch.manual_seed(442)
= [], [], []
tl, vl, a = init_coeffs(n_coeff)
coeffs for i in range(epochs):
= one_epoch(coeffs, lr)
trn_loss, val_loss, acc
tl.append(trn_loss.item())
vl.append(val_loss.item())
a.append(acc.item())return tl, vl, a
Train Multiple Models
def train_multiple_models(runs=100, epochs=30, lr=4, n_coeff=12, is_seed=False):
# initialize recorder object
= pd.DataFrame(columns=["run", "epoch", "trn_loss", "val_loss", "acc"])
recorder for run in range(runs):
# get lists of losses and accuracy
= train_model(epochs, lr, n_coeff, is_seed)
tl, vl, a # create list of run and epoch values
= [run] * epochs
r = [i for i in range(epochs)]
e # append new data to recorder DataFrame
= pd.DataFrame(data={"run": r, "epoch": e, "trn_loss": tl, "val_loss": vl, "acc": a})
row = pd.concat([recorder, row])
recorder return recorder
Plotting Training Results
In this section, I’ll import the data, clean it, create training/validation splits, test out my above functions for a single model training loop, run my experiment for 100 training runs, and plot the results.
Load the Data
from pathlib import Path
= Path('~/.kaggle/kaggle.json').expanduser()
cred_path if not cred_path.exists():
=True)
cred_path.parent.mkdir(exist_ok
cred_path.write_text(creds)0o600) cred_path.chmod(
import os
= os.environ.get('KAGGLE_KERNEL_RUN_TYPE', '')
iskaggle if iskaggle: path = Path("../input/titanic")
else:
= Path('titanic')
path if not path.exists():
import zipfile, kaggle
str(path))
kaggle.api.competition_download_cli(f'{path}.zip').extractall(path) zipfile.ZipFile(
Downloading titanic.zip to /content
100%|██████████| 34.1k/34.1k [00:00<00:00, 1.97MB/s]
Clean the Data
= pd.read_csv(path/'train.csv')
df df
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Thayer) | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
886 | 887 | 0 | 2 | Montvila, Rev. Juozas | male | 27.0 | 0 | 0 | 211536 | 13.0000 | NaN | S |
887 | 888 | 1 | 1 | Graham, Miss. Margaret Edith | female | 19.0 | 0 | 0 | 112053 | 30.0000 | B42 | S |
888 | 889 | 0 | 3 | Johnston, Miss. Catherine Helen "Carrie" | female | NaN | 1 | 2 | W./C. 6607 | 23.4500 | NaN | S |
889 | 890 | 1 | 1 | Behr, Mr. Karl Howell | male | 26.0 | 0 | 0 | 111369 | 30.0000 | C148 | C |
890 | 891 | 0 | 3 | Dooley, Mr. Patrick | male | 32.0 | 0 | 0 | 370376 | 7.7500 | NaN | Q |
891 rows × 12 columns
# replace NAs with the mode of the column
= df.mode().iloc[0] modes
=True) df.fillna(modes, inplace
sum() df.isna().
PassengerId 0
Survived 0
Pclass 0
Name 0
Sex 0
Age 0
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 0
Embarked 0
dtype: int64
# take log(Fare + 1) to make the distribution more reasonable
'LogFare'] = np.log(df['Fare']+1) df[
# convert categoricals to dummy variables
= pd.get_dummies(df, columns=["Sex","Pclass","Embarked"])
df df.columns
Index(['PassengerId', 'Survived', 'Name', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'LogFare', 'Sex_female', 'Sex_male',
'Pclass_1', 'Pclass_2', 'Pclass_3', 'Embarked_C', 'Embarked_Q', 'Embarked_S'],
dtype='object')
# list out the new dummy variables
= ['Sex_male', 'Sex_female', 'Pclass_1', 'Pclass_2', 'Pclass_3', 'Embarked_C', 'Embarked_Q', 'Embarked_S'] added_cols
from torch import tensor
# create tensor of dependent variable data
= tensor(df.Survived) t_dep
= ['Age', 'SibSp', 'Parch', 'LogFare'] + added_cols
indep_cols
# create tensor of independent variable data
= tensor(df[indep_cols].values, dtype=torch.float)
t_indep 2] t_indep[:
tensor([[22.0000, 1.0000, 0.0000, 2.1102, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
[38.0000, 1.0000, 0.0000, 4.2806, 0.0000, 1.0000, 1.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000]])
# normalize the independent variables
= t_indep.max(dim=0)
vals,indices = t_indep / vals
t_indep 2] t_indep[:
tensor([[0.2750, 0.1250, 0.0000, 0.3381, 1.0000, 0.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000, 1.0000],
[0.4750, 0.1250, 0.0000, 0.6859, 0.0000, 1.0000, 1.0000, 0.0000, 0.0000, 1.0000, 0.0000, 0.0000]])
# create indexes for training/validation splits
=RandomSplitter(seed=42)(df) trn_split,val_split
# split data into training and validation sets
= t_indep[trn_split],t_indep[val_split]
trn_indep,val_indep = t_dep[trn_split],t_dep[val_split]
trn_dep,val_dep len(trn_indep),len(val_indep)
(713, 178)
# turn dependent variable into column vector
= trn_dep[:,None]
trn_dep = val_dep[:,None] val_dep
Train a Single Model
First, I’ll train a single model to make sure that I’m getting a similar accuracy as Jeremy’s notebook example:
= train_model(epochs=30, lr=4, n_coeff=12) res
# accuracy is the second list in our results
# the final accuracy should be close to 0.8258
2][-1] res[
0.8258426785469055
Great! My model’s accuracy matches that of the example notebook. Next, I’ll plot the training loss, validation loss and accuracy of the model across 30 epochs:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
= [i for i in range(30)]
xs
0], c='green');
plt.plot(xs, res[1], c='red');
plt.plot(xs, res[2], c='blue');
plt.plot(xs, res[
"Epochs");
plt.xlabel("Loss\nAccuracy");
plt.ylabel(
= mpatches.Patch(color='green', label='Training Loss')
green_patch = mpatches.Patch(color='red', label='Validation Loss')
red_patch = mpatches.Patch(color='blue', label='Accuracy')
blue_patch
=[green_patch, red_patch, blue_patch]); plt.legend(handles
Excellent! With that confirmed, I can run my trial of 100 trainings, and then plot the results:
Training Multiple Models
= train_multiple_models() recorder
recorder.head()
run | epoch | trn_loss | val_loss | acc | |
---|---|---|---|---|---|
0 | 0 | 0 | 0.552340 | 0.540915 | 0.595506 |
1 | 0 | 1 | 0.488773 | 0.491162 | 0.595506 |
2 | 0 | 2 | 0.474533 | 0.479952 | 0.595506 |
3 | 0 | 3 | 0.461460 | 0.469660 | 0.595506 |
4 | 0 | 4 | 0.450005 | 0.460642 | 0.595506 |
recorder.tail()
run | epoch | trn_loss | val_loss | acc | |
---|---|---|---|---|---|
25 | 99 | 25 | 0.390775 | 0.414015 | 0.595506 |
26 | 99 | 26 | 0.390258 | 0.413608 | 0.595506 |
27 | 99 | 27 | 0.389781 | 0.413232 | 0.595506 |
28 | 99 | 28 | 0.389341 | 0.412886 | 0.595506 |
29 | 99 | 29 | 0.388933 | 0.412565 | 0.595506 |
max() recorder.
run 99
epoch 29
trn_loss 0.623253
val_loss 0.604715
acc 0.831461
dtype: object
Plot: Training Loss
(recorder='trn_loss', index='epoch', columns='run')
.pivot_table(values='green', alpha=0.3, legend=False, title='Training Loss')); .plot(color
Plot: Validation Loss
(recorder='val_loss', index='epoch', columns='run')
.pivot_table(values='red', alpha=0.3, legend=False, title='Validation Loss')); .plot(color
Plot: Accuracy
(recorder='acc', index='epoch', columns='run')
.pivot_table(values='blue', alpha=0.3, legend=False, title='Accuracy')); .plot(color
Final Thoughts
This exercise was fascinating, both in terms of building the code to record losses and accuracy for each epoch, as well as observing the final results of 100 training runs.
The main observation that stands out: for all three values (training loss, validation loss and accuracy) there were training runs where the values did not improve at all between the first and last epoch. In the case of training and validation loss, it seems like there were numerous runs where the loss was stuck at around 0.4
. There were many trainings where the accuracy was stuck at around 0.6
.
Only for a handful of training runs did the accuracy cross 0.8
.
In a significant number of runs (as seen by the darkness of the line color on the plot) the training and validation loss gradually decreased during training.
After running this experiment I am pretty surprised. I knew that training neural networks involved some variability, but it’s almost shocking to see how you can get wildly different results for training the same model. Just by happenchance, I can get a model that seemingly does not work (accuracy stuck throughout) and the same model that achieves a better accuracy than the baseline in Jeremy’s notebook. All in all, I’m grateful that I did this exercise because it gave me some perspective on how volatile neural nets can be.