vishal bakshi

welcome to my blog.

Standout Ideas from Lesson 2 of the AI Evals Course

AI Evals

In this blog post, I’m going highlight ideas that stood out to me from the second lesson (Error Analysis) of the AI evals course by Hamel Husain and Shreya Shankar.

Monday, July 28, 2025

Vishal Bakshi

Analyzing PyTorch 2.0 Release Notes for ColBERT Dependency Impact

ColBERT

In this blog post, I walk through the PyTorch 2.0 Release Notes items where I’m estimating there will be some kind of impact to ColBERT, which is currently dependent on torch==1.13.1.

Sunday, July 27, 2025

Vishal Bakshi

TIL: Custom Composer Callback to Push Checkpoints to HuggingFace Hub During Training.

LLM

Custom Composer Callback

In this short TIL blog post, I’m going to share the code I wrote with Claude’s help for a custom Composer callback which pushes the model to Hugging Face Hub every specified number of steps. The purpose of doing so is so that you can run evaluation after training so it doesn’t slow down training.

Saturday, July 26, 2025

Vishal Bakshi

Takeaways from Lesson 1 of the AI Evals Course

AI Evals

In this blog post, I’m going highlight ideas that stood out to me from the first lesson and first three chapters of the course reader from the AI evals course by Hamel Husain and Shreya Shankar.

Tuesday, July 22, 2025

Vishal Bakshi

Revisiting ColBERTv1 : A Return to First Principles

ColBERT

information retrieval

A comprehensive technical deep dive into the ColBERTv1 paper, exploring the late interaction architecture that enables BERT-level retrieval effectiveness with 100x faster query processing through independent encoding, offline indexing, and the MaxSim operation. Includes detailed code walkthroughs, query augmentation analysis, and architectural comparisons that explain how ColBERT bridges the gap between retrieval quality and computational efficiency.

Wednesday, July 16, 2025

Vishal Bakshi

Debugging Flash Attention in LLM-Foundry (and a 20% Slow Down!)

python

deep learning

LLM-Foundry

flash_attn_varlen_func in LLM Foundry resulted in a surprising 20% training slowdown. This post details the debugging process that uncovered the cause : significant overhead from the HuggingFace implementation repeatedly un-padding and re-padding the batch at every layer.

Monday, June 30, 2025

Vishal Bakshi

Introducing portfolio-llm: A Professional Portfolio You Can Chat With

Career

Introducing portfolio-llm, a professional portfolio you can have a conversation with. This post details my journey of building an interactive, LLM-queried portfolio using the llms.txt standard, including the rigorous evaluation framework I created to ensure the system’s reliability.

Thursday, June 26, 2025

Vishal Bakshi

Takeaways from Gemini Deep Research Report on Small Batch Training Challenges

python

fastai

imagenette

TinyScaleLab

deep learning

Motivated by a twitter interaction, I had Gemini generate a report on the challenges and solutions to small batch training. In this blog post I highlight key takeaways from that report, supplemented by my own deep dives on CrossEntropyLoss and Group Normalization, to arrive at next steps for my Imagenette experiments.

Wednesday, June 18, 2025

Vishal Bakshi

An Analysis of Batch Size vs. Learning Rate on Imagenette

python

fastai

deep learning

TinyScaleLab

imagenette

I continue my Imagenette experiments to gain some intuition on how batch size, learning rate and accuracy interact. I find that for all three models too low or too high of a batch size plummets performance, while batch sizes in the 8-32 range maximize accuracy for a range of learning rates.

Wednesday, June 18, 2025

Vishal Bakshi

Cross Entropy Loss Explained

python

deep learning

fastai

A deep dive into Cross-Entropy Loss, revisiting Chapter 5 of the fast.ai textbook to build intuition from first principles. This post explores the mathematical and conceptual link between the sigmoid and softmax functions, explains the meaning of logits, and provides a hands-on PyTorch demonstration of how the loss function’s gradient simplifies to the elegant prediction - target form.

Wednesday, June 11, 2025

Vishal Bakshi

Proof, Pricing, and Passion: Finding My Path in Machine Learning

Career

What’s the minimum experience for AI consulting? How do you price your value? I explore my takeaways from a fireside chat with industry experts and reflect on my own journey from hobbyist to aspiring pro.

Monday, June 9, 2025

Vishal Bakshi

HuggingFace’s Default KV Cache and the `flash_attn_varlen_func` Docstring

python

deep learning

Flash Attention

A deep dive into understanding flash_attn_varlen_func’s docstring’s causal masks (for seqlen_q != seqlen_k) by exploring Hugging Face’s KV Cache (DynamicCache) in model.generate() with hands-on Q/K shape inspection. Unravels “bottom-right alignment” and why flash_attn_func gets called.

Tuesday, June 3, 2025

Vishal Bakshi

Initial Experiments with Imagenette

python

fastai

deep learning

TinyScaleLab

imagenette

This post walks through initial 5 epoch experiments on the small Imagenette dataset as I try to get onto the leaderboard.

Sunday, June 1, 2025

Vishal Bakshi

Understanding the Mean Shift Clustering Algorithm (and PyTorch Broadcasting)

python

machine learning

This post walks through a Python implementation of the Mean Shift Clustering algorithm, detailing the journey from a basic concept to a GPU-optimized version, with a key focus on understanding and visualizing PyTorch broadcasting for batch processing. We explore how manipulating tensor shapes is crucial for performance and uncover insights into how 3D tensors can be intuitively understood as grouped 2D tensors.

Saturday, May 31, 2025

Vishal Bakshi

Exploring Precision in ColBERT Indexing and Retrieval

python

deep learning

ColBERT

information retrieval

In this blog post I show how using full precision (fp32) versus mixed precision (amp) results in different ColBERT index artifacts (centroids.pt and ivf.pid.pt), Recall@10 and retrieved passages.

Sunday, May 25, 2025

Vishal Bakshi

The Evolution of Matrix Multiplication (fastai course Part 2 Lessons 11 and 12)

python

deep learning

fastai

In this blog post I walk through 10 different implementations of matrix multiplication in python, Numba and PyTorch, comparing execution times on matrix multiplications between a 5-digit subset of MNIST and a single weight matrix, as well as matrix multiplications between the full 50k-image MNIST dataset and the weight matrix. My two main takeaways: 1) when in doubt, use PyTorch’s .cuda with the @ operator, and 2) different matrix multiplication algorithms scale differently!

Wednesday, May 21, 2025

Vishal Bakshi

DataInspector with `BinPackCollator`: Inspecting Packed Dataloader Items

LLM

deep learning

LLM-Foundry

Custom Composer Callback

I learn about LLM-Foundry’s BinPackCollator and write a custom Composer callback to inspect the data and confirm that multiple sequences are packed in each batch item, leading to using 95% loss-generating tokens instead of 5%!

Tuesday, May 13, 2025

Vishal Bakshi

Comparing RAGatouille and ColBERT Indexes and Search Results

python

deep learning

information retrieval

RAGatouille

ColBERT

In this blog post I answer two questions: 1. For a given document collection and indexing configuration, do RAGatouille and ColBERT produce the same index? and 2. For a given index and search configuration, do RAGatouille and ColBERT retrieve the same passages/Recall@10?

Saturday, May 10, 2025

Vishal Bakshi

TIL: Resolving RAGatouille OOM Error and `faiss-gpu` Warning

information retrieval

deep learning

RAGatouille

A couple of fixes as I work on indexing large document collections (6M+) using RAGatouille.

Thursday, May 8, 2025

Vishal Bakshi

DataInspector: Inspecting `input_ids` Token Statistics in LLM-Foundry with `packing_ratio=5.0`

LLM

deep learning

Specifying a train_loader.dataset.packing_ratio value greater than 1.0 - lightbox

Thursday, May 8, 2025

Vishal Bakshi

Understanding Sequence Packing - Initial Musings

python

deep learning

LLM

A hands-on investigation into how sequence packing interacts with Flash Attention in HuggingFace Transformers. Through print statements and code exploration, I discovered that position_ids are crucial for sequence packing to work correctly—without them, the wrong Flash Attention function gets called, leading to incorrect outputs and loss values. This post walks through the debugging process, comparing packed sequences with padded batches, and reveals the critical requirement for properly constructed position_ids in sequence-packed training.

Sunday, May 4, 2025

Vishal Bakshi

Initial Manual Scoring Results for TinyStories Models

LLM

deep learning

TinyScaleLab

A detailed breakdown of my manual evaluation of three TinyStories language models (1M, 8M, and 28M parameters) across six capability categories. I share scoring methodology, surprising findings about emergent reasoning in small models, and comparisons to the original TinyStories paper. This analysis establishes baselines both for my own model training project and my eventual LLM Judge, and reveals how different capabilities scale with model size.

Thursday, May 1, 2025

Vishal Bakshi

Curating Evaluation Prompts, Defining Scoring Criteria, and Designing an LLM Judge Prompt Template

LLM

deep learning

TinyScaleLab

Designing simple, robust, effective evals is one of the most enjoyable experiences in ML. It’s truly empowering to take something squishy like language and build structure around it to consistently track performance. In this post, I’ll walk through my process for the TinyScaleLab project, covering how I approached curating evaluation prompts, defining scoring criteria, and designing a template for an LLM Judge.

Monday, April 28, 2025

Vishal Bakshi

TinyScaleLab Update: Training Cost Analysis and Evaluation Infrastructure Plans

LLM

deep learning

TinyScaleLab

I share my progress on the TinyScaleLab project where I’m studying small language models trained on the Tiny Stories dataset. I discuss the architecture of my four model sizes (5M, 25M, 60M, and 125M parameters), compare training costs between L4 and A100 GPUs, and outline my plans for developing an evaluation framework using LLM judges. This project aims to investigate both the training dynamics of tiny models and their language capabilities across grammar, context tracking, factual knowledge, reasoning, creativity, and storytelling.

Sunday, April 27, 2025

Vishal Bakshi

TinyScale Lab Update: Setting Eval Targets and Generating Completions for LLM Judge Development

LLM

deep learning

TinyScaleLab

In this TinyScale Lab update, I tackle how to evaluate language models before building them. Using the TinyStories paper, I set target scores for my upcoming 5M-125M parameter models and generate reference completions using existing TinyStories models (1M, 8M, 28M).

Sunday, April 27, 2025

Vishal Bakshi

TinyScaleLab: Bridging Training Dynamics and Model Capabilities

LLM

deep learning

TinyScaleLab

In TinyScale Lab, I’m exploring the connection between training dynamics and model capabilities using tiny language models (3M-120M parameters) as research proxies. This project bridges insights from the TinyStories and Small-scale proxies papers to understand how training stability affects emergent capabilities like grammar, consistency, and reasoning. By demonstrating that meaningful ML research is possible with modest computational resources, I hope to make AI research more accessible and democratized for resource-constrained researchers worldwide.

Saturday, April 26, 2025

Vishal Bakshi

LossInspector: A Deep Dive Into LLM-Foundry’s Next-Token Prediction with a Custom Composer Callback

python

deep learning

LLM

I’m working on a research project where we’re fine-tuning small models with various techniques and datasets using LLM-Foundry. As part of our infrastructure setup, I wanted to thoroughly understand how a batch of data is prepared, and how the outputs of a model, along with the labels, are passed to the loss function. Enter the custom Composer callback LossInspector!

Tuesday, April 22, 2025

Vishal Bakshi

Optimizing Matrix Multiplication Using Numba and Broadcasting

python

deep learning

LLM

Following the fastai course part 2 Lesson 11 video, I optimize the naive Python nested for-loop matrix multiplication using PyTorch, NumPy and Numba to achieve a 12000x speedup!

Monday, April 21, 2025

Vishal Bakshi

Logging Data Types for Activations, Gradients, Weights, Optimizer States and Loss during Training with LLM-Foundry

python

deep learning

LLM

I write a custom Composer callback (with lots of Claude’s help!) to log data types of different entities during mixed precision LoRA fine-tuning. When the model is in fp32, all entities except activations are in fp32 (activations are in bf16).

Wednesday, April 2, 2025

Vishal Bakshi

Understanding Python Descriptors

python

deep learning

LLM

While vibe coding with Claude, it introduced me to unfamiliar behavior: using the __get__(obj) method to convert a function to a bound method of the given object. This was necessary to monkey patch the self attention module’s forward pass to log input data types as register_forward_hook only works on positional arguments (which LLaMA’s self attention module doesn’t have, it only has keyword arguments). This led me to do a deep dive into understanding descriptors with the helpful Descriptor Guide in the Python docs, which I walkthrough in this blog post.

Tuesday, April 1, 2025

Vishal Bakshi

TIL: Creating a Custom Composer Callback

LLM

deep learning

A walkthrough of my first custom Composer callback where I log weight, activation, gradient and loss data types during the training loop.

Sunday, March 30, 2025

Vishal Bakshi

RAGatouille/ColBERT Indexing Deep Dive

python

information retrieval

machine learning

deep learning

RAGatouille

ColBERT

In this technical deep dive, I recreate the entire RAG indexing pipeline by directly using the internal methods of RAGatouille and ColBERT libraries. I methodically build all indexing artifacts from scratch - from processing document collections and sampling embeddings to calculating centroids with k-means clustering and compressing document vectors with residual encoding. If you’re curious about what actually happens when you call RAG.index(), this video breaks down the full process step by step.

Wednesday, March 12, 2025

Vishal Bakshi

TIL: PeftModel Base Model Behavior

python

deep learning

machine learning

LLM

TIL that the base model gets altered (merged) after you call PeftModel.pretrained to load LoRA adapter weights. I compare weight matrices and analyze memory usage.

Monday, March 10, 2025

Vishal Bakshi

Memory Profiling raw ColBERT and RAGatouille

python

information retrieval

deep learning

RAGatouille

ColBERT

I use the memory-profiler library to log memory using for different indexing functions for raw ColBERT and RAGatouille indexing operations for 100k, 250k, 500k, 1M and 2M collection sizes. In general, RAGatouille uses more memory than raw ColBERT.

Monday, February 17, 2025

Vishal Bakshi

Estimating Storage and CPU RAM Requirements for Indexing 12.6M Documents

python

information retrieval

deep learning

RAGatouille

ColBERT

I index 100k, 250k, 500k, 1M and 2M documents using T4 and RTX6000Ada instances and estimate the storage and CPU RAM requirements for a 12.6M document collection.

Wednesday, February 12, 2025

Vishal Bakshi

Evaluating the DAPR ConditionalQA Dataset with RAGatouille

python

information retrieval

deep learning

RAGatouille

ColBERT

I calculate the Recall@10 metric for answerai-colbert-small-v1 retrieval (via RAGatouille) on the ConditionalQA dataset (via UKPLab/DAPR dataset) using the pytrec and ranx libraries.

Saturday, February 8, 2025

Vishal Bakshi

DoRA’s Magnitude Vector

python

deep learning

machine learning

LLM

In this blog post I highlight a key difference I saw between Raschka’s and peft’s implementation of DoRA.

Saturday, February 1, 2025

Vishal Bakshi

Recreating the PLAID ColBERTv2 Scoring Pipeline: From Research Code to RAGatouille

python

information retrieval

machine learning

deep learning

RAGatouille

ColBERT

In this blog post, I walk through the colbert research codebase (via AnswerAI’s RAGatouille) and work my way line-by-line through the 4-stage PLAID scoring pipeline to recreate RAGatouille results for a toy example of 1 query and 3 documents.

Tuesday, December 24, 2024

Vishal Bakshi

Scoring Full Text and Semantic Search on Chunk Sizes from 100 to 2000 Tokens

python

RAG

information retrieval

fastbookRAG

In this blog post, I run retrieval on three differently preprocessed datasets using four retrieval methods from chunk sizes 100 to 2000 tokens, using my fastbook-benchmark dataset to auto-score the results. Surprisingly, full text search yields the best MRR@10 (0.67) and Recall@10 (0.95) for a chunk size of 2000 tokens.

Friday, November 29, 2024

Vishal Bakshi

Implementing Image-to-Image Generation for Stable Diffusion

python

stable diffusion

fastai

deep learning

machine learning

generative AI

In this blog post I successfully implement image-to-image generation in the diffusion loop provided in Lesson 10 of the fastai course.

Wednesday, November 27, 2024

Vishal Bakshi

Evaluating 4 Retrieval Methods with 6 Chunking Strategies on my fastbook-benchmark Dataset

python

fastbookRAG

information retrieval

In this blog post, I perform retrieval on the fastbook chapter documents using 24 different retrieval method-chunking strategy combinations, auto-scoring using my fastbook-benchmark dataset.

Tuesday, November 26, 2024

Vishal Bakshi

Implementing Negative Prompting for Stable Diffusion

python

stable diffusion

fastai

deep learning

machine learning

generative AI

In this blog post I successfully implement negative prompting in the diffusion loop provided in Lesson 10 of the fastai course. I also explore some other relatively unsuccessful implementations that were interesting and informative nontheless.

Wednesday, November 20, 2024

Vishal Bakshi

Exploring Cosine Similarity in Stable Diffusion’s Latent Space

stable diffusion

generative AI

python

In this blog post I explore cosine similarity between conditioned and unconditioned UNet predictions and the sensitivity of UNet predictions in latent space!

Tuesday, November 19, 2024

Vishal Bakshi

Sentiment Classification with Qwen2-0.5B-Instruct

python

LLM

TinySentiment

In this blog post I use Qwen2-0.5B-Instruct to classify sentiment in the financial_phrasebank dataset with 79.5% accuracy.

Monday, November 18, 2024

Vishal Bakshi

WebGPU Puzzles: Walk through of Official Solutions

AnswerAI

WebGPU

This blog post contains my walkthrough of the official AnswerAI WebGPU Puzzle solutions that I found challenging to understand and/or critical in helping me understand core concepts of GPU programming.

Sunday, November 17, 2024

Vishal Bakshi

Training Textual Inversion Embeddings on Some Samurai Jack Drawings

python

stable diffusion

deep learning

machine learning

In this blog post, I recap my experience (and results) with textual inversion embeddings trained on 6 sketches I created of Samurai Jack.

Wednesday, November 13, 2024

Vishal Bakshi

Comparing Cosine Similarity Between Embeddings of Semantically Similar and Dissimilar Texts with Varying Punctuation

python

RAG

information retrieval

In this blog post, I calculate the cosine similarity between different embeddings for texts that have varying types of punctuation and semantic similarity

Friday, November 8, 2024

Vishal Bakshi

Establishing a Semantic Search (Embedding Cosine Similarity) Baseline for My fastbookRAG Project

python

RAG

information retrieval

fastbookRAG

In this blog post, I experiment with 6 chunking/retrieval strategies to retrieve context from an array of text embeddings sufficient to answer 80.31% of the 193 fastbook end-of-chapter questions from Part 1 of the course.

Tuesday, October 22, 2024

Vishal Bakshi

Conducting a Question-by-Question Error Analysis on Semantic Search Results

python

RAG

information retrieval

fastbookRAG

In this blog post, I conduct a detailed error analysis of 29 questions (from a set of 193), where none of the 6 semantic search methods retrieved sufficient context to answer them. I examine each question, categorize the errors, and discuss potential improvements and implications for future work.

Tuesday, October 22, 2024

Vishal Bakshi

Experimenting with `os.fork`

python

fastai

In this blog post I work through four examples provided by Claude to understand some key concepts related to os.fork, and observe different behaviors when using os.fork in a notebook environment or the shell.

Monday, September 30, 2024

Vishal Bakshi

Generating a GIF Animation Using Stable Diffusion

python

stable diffusion

fastai

deep learning

machine learning

generative AI

In this blog post I repurpose the code provided in Lesson 9/10 of the fastai Part 2 course to generate an animation GIF transitioning from a picture of a skunk to a picture of a puppy.

Thursday, September 26, 2024

Vishal Bakshi

Sentiment Classification with Qwen2-1.5B-Instruct

python

LLM

TinySentiment

In this blog post I use Qwen2-1.5B-Instruct to classify sentiment in the financial_phrasebank dataset with 86.1% accuracy.

Monday, September 23, 2024

Vishal Bakshi

Sentiment Classification with phi-3.5

python

LLM

TinySentiment

In this blog post I use phi-3.5 to classify sentiment in the financial_phrasebank dataset with 93.94% accuracy.

Thursday, September 12, 2024

Vishal Bakshi

Sentiment Classification with phi-3

python

LLM

TinySentiment

In this blog post I use phi-3 to classify sentiment in the financial_phrasebank dataset with 92.79% accuracy.

Thursday, September 12, 2024

Vishal Bakshi

Calculating the Ratio of Gradients in an Image

python

computer vision

TypefaceClassifier

In this blog post I use the OpenCV library to calculate the ratio of the sum of non-zero x- and y-gradients to the sum of non-zero original pixels of an image. The serif font has consistently larger ratios than the sans serif font.

Monday, September 9, 2024

Vishal Bakshi

Calculating the Ratio of Corners in an Image

python

computer vision

TypefaceClassifier

In this blog post I use the OpenCV library to calculate the ratio of the sum of non-zero corner pixels to the sum of non-zero original pixels of an image. The serif font has consistently larger ratios than the sans serif font.

Monday, September 9, 2024

Vishal Bakshi

Calculating the Ratio of 2D FFT Magnitude and Phase of a Text Image

python

computer vision

TypefaceClassifier

In this blog post I use NumPy to calculate the ratio of the mean of 2D FFT magnitude to the count of non-zero binarized pixels (“FFT Magnitude Ratio”) and the ratio of the sum of the absolute value of 2D FFT phase to the sum of binarized pixels (“FFT Phase Ratio”). Both ratios are consistently larger for images with serif text.

Monday, September 9, 2024

Vishal Bakshi

Calculating the Ratio of Letter Perimeter to Area

python

computer vision

TypefaceClassifier

In this blog post I use the OpenCV library to calculate the ratio of letter (contour) perimeter to area. The serif font has consistently larger ratios than the sans serif font.

Friday, September 6, 2024

Vishal Bakshi

Conducting a Question-by-Question Error Analysis on Full Text Search Results

python

RAG

information retrieval

fastbookRAG

In this blog post, I conduct a detailed error analysis of 39 questions from a set of 202, where none of the 6 full text search methods retrieved sufficient context to answer them. I examine each question, categorize the errors, and discuss potential improvements and implications for future work.

Thursday, September 5, 2024

Vishal Bakshi

Establishing a Full Text Search (BM25) Baseline for My fastbookRAG Project

python

RAG

information retrieval

fastbookRAG

In this blog post, I experiment with 6 chunking/retrieval strategies to retrieve context from a sqlite database sufficient to answer 76.7% of the 202 fastbook end-of-chapter questions from Part 1 of the course.

Tuesday, September 3, 2024

Vishal Bakshi

Comparing ~100k Random Numbers Generated with Different Methods

python

machine learning

deep learning

In this blog post, I generate close to 100k random numbers using 5 different methods: ANU Quantum numbers, Python’s random module, NumPy, PyTorch and a custom implementation from Lesson 10 of the fastai course (Part 2). I am surprised by the results!

Tuesday, September 3, 2024

Vishal Bakshi

Implementing a `Matrix` Class in Python

python

deep learning

machine learning

In this blog post I implement a Matrix class following Lesson 10 of the fastai course (Part 2) and add some additional functionality.

Monday, September 2, 2024

Vishal Bakshi

Sentiment Classification with phi-2

python

LLM

TinySentiment

In this blog post I use phi-2 to classify sentiment in the financial_phrasebank dataset with 92% accuracy.

Saturday, August 31, 2024

Vishal Bakshi

Calculating the Squareness of Letters in an Image

python

computer vision

TypefaceClassifier

In this blog post I use the OpenCV library to calculate the “squareness” of letters (minimum dimension to maximum dimension of bounded rectangle around letter) in a text image.

Saturday, August 31, 2024

Vishal Bakshi

Sentiment Classification with Claude Using `claudette`

python

LLM

TinySentiment

In this blog post I use Sonnet (94% accuracy), Opus (94%) and Haiku (92%) to classify sentiment in the financial_phrasebank dataset.

Thursday, August 29, 2024

Vishal Bakshi

Calculating the Ratio of Horizontal to Vertical Details in a Text Image

python

computer vision

TypefaceClassifier

In this blog post I use the PyWavelets library to calculate the ratio of horizontal to vertical details in text image.

Wednesday, August 28, 2024

Vishal Bakshi

Iterating on Full Text Search Keywords using `claudette`

python

RAG

information retrieval

LLM

fastbookRAG

In this blog post, I use Answer.AI’s claudette library to iteratively improve keywords generated for sqlite’s full text search.

Tuesday, August 27, 2024

Vishal Bakshi

Is it a Digit?

python

fastai

deep learning

computer vision

In this notebook, I train a model on noisy MNIST images and deploy it in a HuggingFace Space. The model is then used to predict the probability that a user-drawn figure on a canvas is a digit.

Monday, August 26, 2024

Vishal Bakshi

Generating Full Text Search Keywords using `claudette`

python

RAG

information retrieval

LLM

fastbookRAG

In this blog post, I use Answer.AI’s claudette library to interface with the Claude-3.5 Sonnet API.

Sunday, August 25, 2024

Vishal Bakshi

Fine-tuning TinyStories-3M on the `financial_phrasebank` Dataset

python

LLM

TinySentiment

In this blog post I fine-tune the TinyStories-3M model on the financial_phrasebank dataset and achieve 74%+ accuracy on the validation and test set.

Thursday, August 22, 2024

Vishal Bakshi

Fine-tuning TinyStories-1M on the `financial_phrasebank` Dataset

python

LLM

TinySentiment

In this blog post I fine-tune the TinyStories-1M model on the financial_phrasebank dataset and achieve 68%+ accuracy on the validation and test set.

Thursday, August 22, 2024

Vishal Bakshi

Calculating the Letter Area Ratio in a Text Image

python

computer vision

TypefaceClassifier

In this blog post, as I develop a non-ML baseline for image typeface classification, I use the OpenCV library to calculate the average letter area ratio in a text image.

Wednesday, August 21, 2024

Vishal Bakshi

Fine-tuning TinyStories-8M on the `financial_phrasebank` Dataset

python

LLM

TinySentiment

In this blog post I fine-tune the smaller TinyStories-8M model on the financial_phrasebank dataset and achieve 86% accuracy on the test set and 85% accuracy on the validation set.

Monday, August 19, 2024

Vishal Bakshi

Calculating the Average Stroke Width of Letters in a Text Image

python

computer vision

TypefaceClassifier

In this blog post, as I develop a non-ML baseline for image typeface classification, I use the OpenCV library to calculate the average stroke width of the letters in a text image.

Monday, August 19, 2024

Vishal Bakshi

Calculating the Aspect Ratio of Letters in a Text Image

python

computer vision

TypefaceClassifier

In this blog post, as I develop a non-ML baseline for image typeface classification, I use the OpenCV library to calculate the aspect ratio (width/height) of each letter in a text image.

Wednesday, August 14, 2024

Vishal Bakshi

Fine-tuning TinyStories-33M on the `financial_phrasebank` Dataset

python

LLM

TinySentiment

In this blog post I fine-tune the TinyStories-33M model on the financial_phrasebank dataset and achieve 79%+ accuracy on the validation and test set.

Tuesday, August 13, 2024

Vishal Bakshi

Determining the Ratio of Lowercase to Uppercase Letter Heights Using Signal Analysis

python

computer vision

TypefaceClassifier

In this blog post I apply concepts from signal analysis to determine the ratio of lowercase-to-uppercase letter heights based on an image of text.

Monday, August 12, 2024

Vishal Bakshi

Using Hybrid Search to Answer fastai the Chapter 1 Questionnaire

python

RAG

information retrieval

fastbookRAG

In this blog post I use different approaches to combine FTS5 (keyword search) and Cosine Similarity (semantic search) to retrieve context necessary to answer questions about Chapter 1 of the fastai textbook.

Sunday, August 11, 2024

Vishal Bakshi

How Does Stable Diffusion Work?

deep learning

machine learning

fastai

stable diffusion

generative AI

In this blog post I review the material taught in Lesson 9 of the fastai course (Part 2: Deep Learning Foundations to Stable Diffusion).

Thursday, August 8, 2024

Vishal Bakshi

Using Cosine Similarity to Retrieve Context to Answer fastbook Questionnaire

python

RAG

information retrieval

fastbookRAG

In this blog post I am able to answer 76% of the fastbook Chapter 1 Questionnaire using cosine similarity between the question and the chunked chapter text.

Tuesday, August 6, 2024

Vishal Bakshi

Using TinyInstruct-33M for `financial_phrasebank` Sentiment Classification

python

LLM

TinySentiment

In this blog post I find that TinyInstruct-33M does not follow instructions that deviate from its training data.

Monday, August 5, 2024

Vishal Bakshi

Using Full Text Search to Answer the fastbook Chapter 1 Questionnaire

python

RAG

information retrieval

fastbookRAG

In this blog post I’ll walk through my experiments of using sqlite full text search to retrieve context relevant to answering chapter review questions. This is part of a larger fastbookRAG proejct I’m work on.

Sunday, August 4, 2024

Vishal Bakshi

Calculating the Flesch Kincaid Reading Grade Level for the `financial_phrasebank` Dataset

python

machine learning

deep learning

In this blog post I calculate the Flesch Kincaid reading grade level for the financial_phrasebank dataset and find that it’s much higher than the average TinyStories reading level.

Saturday, August 3, 2024

Vishal Bakshi

Calculating the Flesch Kincaid Reading Grade Level for the TinyStories Dataset

python

machine learning

deep learning

TinySentiment

In this blog post I explore the Flesch Kincaid reading grade level formula and implementation in the textstat library to calculate it for the TinyStories dataset.

Friday, August 2, 2024

Vishal Bakshi

BM25 and Cosine Similarity Demo

python

RAG

information retrieval

fastbookRAG

In this blog post I walk through a couple of toy examples using full text search and cosine similarity to answer queries using a toy dataset of documents.

Wednesday, July 31, 2024

Vishal Bakshi

Paper Math: rsLoRA

deep learning

machine learning

paper math

LLM

In this blog post I think out loud as I attempt to understand pieces of the math presented in the rsLoRA paper.

Thursday, July 25, 2024

Vishal Bakshi

Practical Deep Learnings For Coders - Part 1 Notes and Examples

deep learning

fastai

machine learning

python

This notebook contains all of my notes on the videos, notebooks and book chapters covered in Part 1 of the Practical Deep Learning for Coders fastai course.

Sunday, July 14, 2024

Vishal Bakshi

Training Models on the MovieLens 25M Dataset

deep learning

machine learning

fastai

python

In this notebook I train models using 5 different architectures on the 25 million rating MovieLens dataset and compare performance and results.

Saturday, July 13, 2024

Vishal Bakshi

Comparing CNN Performance by Varying Activation Normalization Layers

deep learning

fastai

python

In this notebook I train CNNs using different activation normalization layers and compare performance and results.

Sunday, July 7, 2024

Vishal Bakshi

Comparing CNN Performance by Varying Batch Normalization Placement

deep learning

fastai

python

In this notebook I place the activation function before and after the batch normalization layer in a CNN and compare the model performance and results.

Tuesday, July 2, 2024

Vishal Bakshi

Training a Collaborative Filtering Model Using Cross Entropy Loss

machine learning

fastai

python

In this notebook I create a collaborative filtering (classifier) architecture suited to use with cross-entropy loss.

Monday, July 1, 2024

Vishal Bakshi

Exploring Collaborative Filtering Applications

machine learning

fastai

python

In this notebook I explore 3-4 areas where collaborative filtering is used, citing examples from research and commercial publications.

Wednesday, June 26, 2024

Vishal Bakshi

Comparing PyTorch `Embeddings` with Custom Embeddings

machine learning

fastai

python

In this notebook I compare the code required to build a collaborative filtering model using PyTorch Embeddings and custom embeddings.

Tuesday, June 25, 2024

Vishal Bakshi

Exploring Different Feature Detection Algorithms in Computer Vision

python

computer vision

In this notebook I run code examples for different non-ML Computer Vision feature detection algorithms.

Monday, June 24, 2024

Vishal Bakshi

Training Collaborative Filtering Models on MovieLens 100k with Different Weight Decay Values

machine learning

fastai

python

In this notebook I explore the question—how does the wd (weight decay) parameter affect model performance and weight distributions? I use the MovieLens 100k subset as the dataset.

Monday, June 3, 2024

Vishal Bakshi

Training Collaborative Filtering Models on MovieLens 100k with Different `y_range` Values

machine learning

fastai

python

In this notebook I explore the question—how does the y_range parameter affect model performance and prediction distributions? I use the MovieLens 100k subset as the dataset.

Thursday, May 23, 2024

Vishal Bakshi

Understanding the Code in fastai’s `LabelSmoothingCrossEntropy`

deep learning

fastai

python

Inspired by Aman Arora’s blog post, I walk through code of the fastai function LabelSmoothingCrossEntropy.

Tuesday, May 21, 2024

Vishal Bakshi

Initial Reaction: Eagles 2024 NFL Schedule

nfl

eagles

A knee-jerk reaction to the 2024 NFL schedule release.

Wednesday, May 15, 2024

Vishal Bakshi

Improving Kaggle Private Score with Multi-Target Classification

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I apply Jeremy Howard’s approach to multi-target classification in fastai to improve a Kaggle submission score.

Wednesday, May 15, 2024

Vishal Bakshi

Paper Summary: RewardBench

paper summary

deep learning

LLM

A summary of research benchmarking reward models.

Friday, April 26, 2024

Vishal Bakshi

Paper Math and Summary: ORPO (Odds Ratio Preference Optimization)

paper math

paper summary

deep learning

LLM

Summarizing the research from the Odds Ratio Preference Optimization paper and exploring its math to better understand it.

Friday, April 19, 2024

Vishal Bakshi

Recap: HMS HBAC Kaggle Competition

fastai

kaggle competition

deep learning

A recap of what and how I did on the Harvard Medical Harmful Brain Activity Classification Kaggle Competition.

Tuesday, April 16, 2024

Vishal Bakshi

Paper Math: KTO (Kahneman Tversky Optimization)

paper math

deep learning

LLM

Exploring the math from the Kahneman Tversky Optimization paper to better understand it.

Friday, April 12, 2024

Vishal Bakshi

Paper Math: DPO (Direct Preference Optimization)

paper math

deep learning

LLM

Exploring the math from the Direct Preference Optimization paper to better understand it.

Friday, April 5, 2024

Vishal Bakshi

Paper Summary: Attention is All You Need

paper summary

deep learning

LLM

A summary of research introducing the Transformer architecture and a code walkthrough for the Encoder and Decoder.

Saturday, March 30, 2024

Vishal Bakshi

Paper Summary: Constitutional AI

paper summary

deep learning

LLM

A summary of research on Constitutional AI by Anthropic, in which they train a non-evasive harmless AI assistant using human-generated helpfulness preference data and AI-generated harmlessness preference data.

Friday, March 22, 2024

Vishal Bakshi

Recap: My First Live Kaggle Competition

fastai

kaggle competition

machine learning

A recap of what and how I did on the Multi-Class Prediction of Obesity Risk Kaggle Competition.

Thursday, February 29, 2024

Vishal Bakshi

Paper Summary: Training Data for the Price of a Sandwich

Trustworthy AI

LLM

In this blog post I summarize the discussion in the paper ‘Training Data for the Price of a Sandwich: Common Crawl’s Impact on Generative AI’ by Stefan Baack and Mozilla Insights.

Monday, February 19, 2024

Vishal Bakshi

Paper Summary: Textbooks are All You Need I & II

paper summary

deep learning

LLM

A summary of research on the phi-1, phi-1.5 and phi-2 from the Textbook Are All You Need I and II series of publications by Microsoft Research.

Monday, February 19, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 8

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I apply to my large ensemble Jeremy Howard’s approach in the “Scaling Up - Road to the Top, Part 3” notebook.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 7

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I run the code from Jeremy Howard’s “Scaling Up - Road to the Top, Part 3” notebook.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 6

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 13 video in which he finishes working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 5

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 12 video in which he continues working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 4

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 11 video in which he continues working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 3

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 10 video in which he continues working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 2

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 9 video in which he continues working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Paddy Doctor Kaggle Competition - Part 1

deep learning

fastai

kaggle competition

paddy doctor

python

In this notebook I work through Jeremy Howard’s Live Coding 8 video in which he starts working on the Paddy Doctor Disease Classification Kaggle Competition.

Monday, February 5, 2024

Vishal Bakshi

Regardless, Go Birds

nfl

eagles

A reflection on the 2023 Philadelphia Eagles season.

Wednesday, January 17, 2024

Vishal Bakshi

Prompting LLMs Using Different Prompting Styles

deep learning

LLM

python

In this notebook I use 20 math reasoning dataset questions to prompt three 7B-parameter LLMs using 3 different prompting styles.

Thursday, November 2, 2023

Vishal Bakshi

Paper Summary: Plan-and-Solve Prompting

deep learning

LLM

python

In this notebook I summarize the findings from the paper “Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models” by Lei Wang, et al.

Friday, October 20, 2023

Vishal Bakshi

Understanding the fastai TabularModel Class

deep learning

python

In this notebook I walk through line-by-line the source code of the fastai TabularModel class.

Tuesday, October 10, 2023

Vishal Bakshi

Using Neural Net Embeddings to Improve a Random Forest

deep learning

python

In this notebook I replace categorical features in a dataset with corresponding neural net embedding outputs to improve the performance of a random forest.

Monday, October 9, 2023

Vishal Bakshi