Using TinyInstruct-33M for `financial_phrasebank` Sentiment Classification

python

LLM

TinySentiment

In this blog post I find that TinyInstruct-33M does not follow instructions that deviate from its training data.

Author

Vishal Bakshi

Published

August 5, 2024

Setup

Show pip installs

!pip install transformers~=4.37.2 -qq
!pip install huggingface_hub~=0.20.3 -qq
!pip install datasets~=2.16.1 -qq
!pip install accelerate -qq

Show imports

import torch
from datasets import load_dataset
from transformers import pipeline, logging

logging.set_verbosity_error()

torch.cuda.set_device(0)

model_name = "roneneldan/TinyStories-Instruct-33M"

# create pipeline
pipe = pipeline(
    "text-generation",
    model=model_name,
    device_map="auto",
    trust_remote_code=True
)

# load dataset
dataset = load_dataset(
    "financial_phrasebank", "sentences_allagree", 
    split="train"  # note that the dataset does not have a default test split
)

Background

In this notebook I’ll see how accurately the TinyInstruct-33M model can classify sentiment in the financial_phrasebank dataset out-of-the-box, without any fine-tuning. I expect that the model will not perform well for two reasons:

it’s trained specifically to generate stories given a set of prompts.
it’s vocabulary is at a much lower grade level than the financial_phrasebank dataset.

Initial Warmup Prompts

I’ll start by prompting TinyInstruct-33M with a prompt format it’s trained on:

prompt = """
Summary: Lily and Timmy build a sandcastle together and learn to compromise, but it gets knocked over by a gust of wind.
They find beauty in the broken sandcastle and play happily with a butterfly.
Features: Dialogue, Foreshadowing, Twist
Sentence: One day, she went to the park and saw a beautiful butterfly.
Words: disagree, network, beautiful
Story: """

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=200,
    do_sample=True, 
    temperature=0.3,
    return_full_text=False)

print(output[0]["generated_text"])



Once upon a time, there were two best friends, Lily and Timmy. Every day they would play together and have lots of fun. 

One day, they decided to build a sandcastle. Lily wanted to make it really tall, but Timmy wanted to make it straight. They disagreed and argued for a long time. 

Finally, Lily said, "Let's make it really tall, Timmy!" 

Timmy agreed and they started to build. They worked together and soon they had built the most beautiful sandcastle ever. 

But then something unexpected happened. A big gust of wind blew through the park and knocked over their sandcastle! 

Lily and Timmy were sad, but they decided to find something else to do. They found a beautiful butterfly and decided to make it even better. 

They were so happy and they spent the rest of the day playing with the butterfly. 

The end

Great! The model responds appropriately to the type of prompt it’s trained on. Next, I’ll try a simple prompt that contains language the model should be able to understand (given that it understands words like 'beautiful', 'Foreshadowing', and 'network').

prompt = "What color is an apple?"

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=200,
    do_sample=True, 
    temperature=0.3,
    return_full_text=False)

print(output[0]["generated_text"])

”

The little girl smiled and said, “It’s a red apple.”

The man smiled and said, “That’s right! Apples are healthy and delicious.”

The little girl smiled and said, “I like apples!”

The man and the little girl both laughed and enjoyed the apple together. They had a wonderful time in the park.

I suppose it indirectly answers the question by including "red apple" in the context of the story. Does it respond to an instruction?

prompt = "What color is a banana? Respond with one word"

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=2,
    do_sample=True, 
    temperature=0.3,
    return_full_text=False)

print(output[0]["generated_text"])

, like

prompt = "What color is an orange? Respond with one word"

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=2,
    do_sample=True, 
    temperature=0.9,
    return_full_text=False)

print(output[0]["generated_text"])

?"

prompt = "What color is a crow? Respond with one word"

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=2,
    do_sample=True, 
    temperature=0.6,
    return_full_text=False)

print(output[0]["generated_text"])

: C

Nope! Trying different simple prompts (with different temperature levels) yields unsatisfactory results. The model is not following the given instruction.

Prompting TinyInstruct-33M with `financial_phrasebank` Data

Given that TinyInstruct-33M can’t follow simple instructions that differ from its training data, I am expecting it won’t follow sentiment classification for the financial_phrasebank datset.

I’ll start by giving it my best-performing phi-2 prompt:

promptM = """Your task is to analyze the sentiment (from an investor's perspective) of the text below.

Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.

Examples:

Instruct: According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: neutral

Instruct: For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier , while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: positive

Instruct: Jan. 6 -- Ford is struggling in the face of slowing truck and SUV sales and a surfeit of up-to-date , gotta-have cars .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: negative

Instruct: At the request of Finnish media company Alma Media 's newspapers , research manager Jari Kaivo-oja at the Finland Futures Research Centre at the Turku School of Economics has drawn up a future scenario for Finland 's national economy by using a model developed by the University of Denver .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: neutral

Instruct: STOCK EXCHANGE ANNOUNCEMENT 20 July 2006 1 ( 1 ) BASWARE SHARE SUBSCRIPTIONS WITH WARRANTS AND INCREASE IN SHARE CAPITAL A total of 119 850 shares have been subscribed with BasWare Warrant Program .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: neutral

Instruct: A maximum of 666,104 new shares can further be subscribed for by exercising B options under the 2004 stock option plan .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output: neutral

Instruct: In the third quarter of 2010 , net sales increased by 5.2 % to EUR 205.5 mn , and operating profit by 34.9 % to EUR 23.5 mn .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output:"""

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=5,
    do_sample=True, 
    temperature=0.3,
    return_full_text=False)

print(output[0]["generated_text"])

. It is black and

Nope! That doesn’t seem to work. I’ll give it a simpler prompt:

prompt = """Your task is to analyze the sentiment (from an investor's perspective) of the text below.

Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.

According to Gran , the company has no plans to move all production to Russia , although that is where the company is growing .
Respond with only one of these words: negative, positive, or neutral. If the amount of money is not explicitly increasing or decreasing, respond with neutral.
Output:"""

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=5,
    do_sample=True, 
    temperature=0.3,
    return_full_text=False)

print(output[0]["generated_text"])


Summary: Two countries

TinyInstruct-33M does not seem aligned to this type of instruction following. As a final party trick I’ll see if setting up the financial_phrasebank data in the model’s training format nudges it in the right direction.

prompt = """Summary: For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier,
while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .
Features: positive
Sentence: positive
Words: positive
Story: """

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=100,
    do_sample=True, 
    temperature=0.2,
    return_full_text=False)

print(output[0]["generated_text"])



Once upon a time there was a little girl called Athena. She was three years old and loved to play with her friends. One day, Athena's friends asked her to come to the park with them. When she arrived, Athena noticed that everyone was wearing the same type of clothing as her. She was confused and asked her friends what they were doing.

Athenaces were announced that they were called distinguishedIN Neck�a, and she was the official mascot of the local Park

prompt = """Summary: For the last quarter of 2010 , Componenta 's net sales doubled to EUR131m from EUR76m for the same period a year earlier,
while it moved to a zero pre-tax profit from a pre-tax loss of EUR7m .
Features: respond with one word (negative, positive, neutral)
Sentence: respond with one word (negative, positive, neutral)
Words: respond with one word (negative, positive, neutral)
Story: """

Show pipeline

output = pipe(
    prompt,
    max_new_tokens=100,
    do_sample=True, 
    temperature=0.1,
    return_full_text=False)

print(output[0]["generated_text"])



Once upon a time there was a quarter yearrissa. She was a quarterstruck shopper and loved to move around. Every year, she would go to a different town and meet new people.

One year, she was asked to come to a special place. It was called "Adopt A Receive". She was excited to go and meet new people.

When she arrived at the place, she saw a big sign that said "Adwin Opener".

Interesting that at a low temperature (0.1) TinyInstruct has snuck in the word “quarter” from the financial_phrasebank sentence. However, it still does not classify the sentiment of the sentence.

Final Thoughts

This was probably the least exciting LLM exercise I’ve ever done, but I felt it was necessary to at least give TinyInstruct-33M a fair shot at classifying financial_phrasebank sentiment without fine-tuning it.

In a separate notebook, I’ll fine-tune TinyInstruct-33M on a portion of the financial_phrasebank dataset and see how it performs on a held out test set.

I hope you enjoyed this (short) blog post! Follow me on Twitter @vishal_learner.

Setup

Background

Initial Warmup Prompts

Prompting TinyInstruct-33M with financial_phrasebank Data

Final Thoughts

Prompting TinyInstruct-33M with `financial_phrasebank` Data