from fastai.text.all import *
= untar_data(URLs.IMDB) path
Fine-Tuning a Language Model as a Text Classifier
In this notebook, I’ll fine-tune a lanaguage model on the IMDb reviews dataset, grab the encoder, create a new classification model with it and then fine-tune it to classify IMDb reviews as positive or negative. The code (and prose) below is taken from Chapter 10 of the fastai textbook.
The data is stored in three folders: train
(25k labeled reviews), test
(25k labeled reviews) and unsup
(50k unlabeled reviews). The language model is trained on all 100k reviews and the classification model is trained using the train
dataset (its accuracy calculated on the test
validation set).
path.ls()
(#7) [Path('/root/.fastai/data/imdb/tmp_clas'),Path('/root/.fastai/data/imdb/imdb.vocab'),Path('/root/.fastai/data/imdb/unsup'),Path('/root/.fastai/data/imdb/tmp_lm'),Path('/root/.fastai/data/imdb/README'),Path('/root/.fastai/data/imdb/train'),Path('/root/.fastai/data/imdb/test')]
Fine-Tuning the Pretrained Language Model
First, we fine-tune the pretrained language model (which was trained on all of Wikipedia) using 100k movie reviews. This fine-tuned model will learn to predict the next word of an IMDb movie review.
Note that fastai’s TextBlock
sets up its numericalizer’s vocab automatically.
= partial(get_text_files, folders=['train', 'test', 'unsup'])
get_imdb
= DataBlock(
dls_lm =TextBlock.from_folder(path, is_lm=True),
blocks=get_imdb,
get_items=RandomSplitter(0.1)
splitter=path, bs=128, seq_ln=80) ).dataloaders(path, path
The dependent variable is the independent variable shifted over by one token:
=2) dls_lm.show_batch(max_n
text | text_ | |
---|---|---|
0 | xxbos xxmaj this movie is my favorite of all time . xxmaj the dialogue is spectacular , and is delivered with such rapid - fire speed that one viewing is not enough . xxmaj the film comedy was elevated to new heights with xxmaj howard xxmaj hawks outstanding direction . xxmaj based on the classic play " the xxmaj front xxmaj page " , xxmaj hawks gives it a delightful twist by | xxmaj this movie is my favorite of all time . xxmaj the dialogue is spectacular , and is delivered with such rapid - fire speed that one viewing is not enough . xxmaj the film comedy was elevated to new heights with xxmaj howard xxmaj hawks outstanding direction . xxmaj based on the classic play " the xxmaj front xxmaj page " , xxmaj hawks gives it a delightful twist by presenting |
1 | xxmaj woody xxmaj woodpecker , " duck xxmaj amuck " and especially " one xxmaj froggy xxmaj evening " show up how weak this movie is in comparison . xxmaj plus the movie fits in shambolic slapstick alongside strained sentiment ( the underlying theme of the story is family ; our hero is n't ready to have a son , and his nemesis - xxmaj alan xxmaj cumming as the xxmaj norse | woody xxmaj woodpecker , " duck xxmaj amuck " and especially " one xxmaj froggy xxmaj evening " show up how weak this movie is in comparison . xxmaj plus the movie fits in shambolic slapstick alongside strained sentiment ( the underlying theme of the story is family ; our hero is n't ready to have a son , and his nemesis - xxmaj alan xxmaj cumming as the xxmaj norse god |
= language_model_learner(
learn
dls_lm,
AWD_LSTM,=0.3,
drop_mult=[accuracy, Perplexity()]
metrics ).to_fp16()
I fine-tuned the model for one epoch and saved it to load and use later. language_model_learner
automatically freezes the pretrained model so it trains only the randomly instantiated embeddings representing the IMDb vocab.
1, 2e-2) learn.fit_one_cycle(
Paperspace’s file browser is located at /notebooks
so I change the learn.path
to that location:
= Path('/notebooks') learn.path
I then save the learner so that it saves the trained embeddings.
'1epoch') learn.save(
Path('/notebooks/models/1epoch.pth')
Later on, I load the saved model, unfreeze the layers of the pretrained language model and fine-tune it for 10 epochs on the IMDb reviews dataset at a smaller learning rate (as shown in the fastai text):
= learn.load('1epoch') learn
learn.unfreeze()10, 2e-3) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | perplexity | time |
---|---|---|---|---|---|
0 | 4.214371 | 4.114542 | 0.300169 | 61.224136 | 41:36 |
1 | 3.917021 | 3.850335 | 0.316820 | 47.008827 | 42:00 |
2 | 3.752428 | 3.724050 | 0.326502 | 41.431866 | 42:13 |
3 | 3.660530 | 3.660284 | 0.331666 | 38.872364 | 42:32 |
4 | 3.560096 | 3.620281 | 0.335297 | 37.348042 | 42:36 |
5 | 3.507077 | 3.592660 | 0.338347 | 36.330578 | 42:44 |
6 | 3.430038 | 3.575986 | 0.340261 | 35.729839 | 42:39 |
7 | 3.360812 | 3.566898 | 0.341806 | 35.406578 | 42:53 |
8 | 3.310551 | 3.567138 | 0.342046 | 35.415089 | 43:28 |
9 | 3.297931 | 3.570799 | 0.341944 | 35.544979 | 44:01 |
IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.
Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)
We save all of our model except the final layer that converts activations to probabilities of picking each token in our vocabulary. The model not including the final layer is called the encoder.
'imdb_finetuned') learn.save_encoder(
Before we fine-tune the model to be a classifier, the textbook has us generate random reviews:
= 'I liked this movie because'
TEXT = 40
N_WORDS = 2
N_SENTENCES = [learn.predict(TEXT, N_WORDS, temperature=0.75) for _ in range(N_SENTENCES)]
preds
print("\n".join(preds))
i liked this movie because it showed a lot of normal people in America about who we belong and what they say and do .
The acting was great , the story was fun and enjoyable and the movie was very well
i liked this movie because my family and i are great Canadians and also Canadians , especially the Canadians . This is not a Canadian and American movie , but instead of being a " mockumentary " about the
The reviews are certainly not polished, but it’s still fascinating to see how the model predicts the next word to create a somewhat sensical review.
Fine-tune the Text Classifier
For the final piece of this lesson, we move from language model to classifier, starting with creating the classifier DataLoaders
.
We pass it the vocab of the language model to make sure we use the same correspondence of token to index, so that the embeddings learned in the fine-tuned language model can be applied to the classifier.
The dependent variable in this classifier is the label of the parent folder, pos
for positive and neg
for negative.
Finally, we don’t pass is_lm=True
to the TextBlock
since it’s False
by default (which we want in this case because we have labeled data, and don’t want to use next token as the label).
/'train').ls() (path
(#4) [Path('/root/.fastai/data/imdb/train/pos'),Path('/root/.fastai/data/imdb/train/unsupBow.feat'),Path('/root/.fastai/data/imdb/train/neg'),Path('/root/.fastai/data/imdb/train/labeledBow.feat')]
= DataBlock(
dls_clas =(TextBlock.from_folder(path, vocab=dls_lm.vocab), CategoryBlock),
blocks= parent_label,
get_y = partial(get_text_files, folders=['train', 'test']),
get_items =GrandparentSplitter(valid_name='test')
splitter=path, bs=128, seq_len=72) ).dataloaders(path, path
The independent variable is the movie review and the dependent variable is the sentiment (positive, pos
, or negative, neg
):
=3) dls_clas.show_batch(max_n
text | category | |
---|---|---|
0 | xxbos xxmaj match 1 : xxmaj tag xxmaj team xxmaj table xxmaj match xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley vs xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit xxmaj bubba xxmaj ray and xxmaj spike xxmaj dudley started things off with a xxmaj tag xxmaj team xxmaj table xxmaj match against xxmaj eddie xxmaj guerrero and xxmaj chris xxmaj benoit . xxmaj according to the rules of the match , both opponents have to go through tables in order to get the win . xxmaj benoit and xxmaj guerrero heated up early on by taking turns hammering first xxmaj spike and then xxmaj bubba xxmaj ray . a xxmaj german xxunk by xxmaj benoit to xxmaj bubba took the wind out of the xxmaj dudley brother . xxmaj spike tried to help his brother , but the referee restrained him while xxmaj benoit and xxmaj guerrero | pos |
1 | xxbos xxmaj by now you 've probably heard a bit about the new xxmaj disney dub of xxmaj miyazaki 's classic film , xxmaj laputa : xxmaj castle xxmaj in xxmaj the xxmaj sky . xxmaj during late summer of 1998 , xxmaj disney released " kiki 's xxmaj delivery xxmaj service " on video which included a preview of the xxmaj laputa dub saying it was due out in " 1 xxrep 3 9 " . xxmaj it 's obviously way past that year now , but the dub has been finally completed . xxmaj and it 's not " laputa : xxmaj castle xxmaj in xxmaj the xxmaj sky " , just " castle xxmaj in xxmaj the xxmaj sky " for the dub , since xxmaj laputa is not such a nice word in xxmaj spanish ( even though they use the word xxmaj laputa many times | pos |
2 | xxbos xxmaj titanic directed by xxmaj james xxmaj cameron presents a fictional love story on the historical setting of the xxmaj titanic . xxmaj the plot is simple , xxunk , or not for those who love plots that twist and turn and keep you in suspense . xxmaj the end of the movie can be figured out within minutes of the start of the film , but the love story is an interesting one , however . xxmaj kate xxmaj winslett is wonderful as xxmaj rose , an aristocratic young lady betrothed by xxmaj cal ( billy xxmaj zane ) . xxmaj early on the voyage xxmaj rose meets xxmaj jack ( leonardo dicaprio ) , a lower class artist on his way to xxmaj america after winning his ticket aboard xxmaj titanic in a poker game . xxmaj if he wants something , he goes and gets it | pos |
Each batch has to have tensors of the same size, so fastai does the following (when using a TextBlock
with is_lm=False
):
- Batch together texts that are roughly the same lengths (by sorting the documents by length prior to each epoch).
- Expand the shortest texts to make them all the same size (as the largest document in the batch) by padding them with a special padding token that will be ignored by the model.
Let’s create the model to classify texts:
= text_classifier_learner(
learn
dls_clas,
AWD_LSTM, =0.5,
drop_mult=accuracy
metrics ).to_fp16()
Load the encoder from our fine-tuned language model:
= Path('/notebooks') learn.path
= learn.load_encoder('imdb_finetuned') learn
The last step is to train with discriminative learning rates and gradual unfreezing. For NLP classifiers the text recommends unfreezing a few layers at a time to achieve the best performance:
1, 2e-2) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.245777 | 0.174727 | 0.934000 | 01:48 |
We get a similar accuracy as the textbook value (0.929320).
Next, train the model with all layers except the last two parameter groups frozen:
-2)
learn.freeze_to(1, slice(1e-2/(2.6**4), 1e-2)) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.226701 | 0.161235 | 0.938800 | 01:59 |
The accuracy improved a bit!
Unfreeze the third parameter group and keep training:
-3)
learn.freeze_to(1, slice(5e-3/(2.6**4), 5e-3)) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.188972 | 0.147045 | 0.946440 | 02:43 |
The accuracy continues to improve.
Finally, train the whole model:
learn.unfreeze()2, slice(1e-3/(2.6**4), 1e-3)) learn.fit_one_cycle(
epoch | train_loss | valid_loss | accuracy | time |
---|---|---|---|---|
0 | 0.163849 | 0.143639 | 0.947600 | 03:18 |
1 | 0.149648 | 0.144494 | 0.947840 | 03:19 |
We’ll test the model with a few low-hanging-fruit inputs:
"I really like this movie!") learn.predict(
('pos', tensor(1), tensor([0.0034, 0.9966]))
"I really did not like this movie!") learn.predict(
('neg', tensor(0), tensor([0.9985, 0.0015]))
"I'm not sure if I loved or hated this movie") learn.predict(
('neg', tensor(0), tensor([0.6997, 0.3003]))
To recap, here are the three steps that were involved in creating the IMDb movie review classifier:
- A language model was pretrained on all of Wikipedia.
- We then fine-tuned that model on 100k IMDb movie reviews (documents).
- Using the encoder from the fine-tuned language model, we created a classification model and fine-tuned it for a few epochs, gradually unfreezing layers for consecutive epochs. This model accurately classifies movie review as positive or negative.
That’s a wrap for this exercise. I hope you enjoyed this blog post!