Calculating the Aspect Ratio of Letters in a Text Image

python
computer vision
TypefaceClassifier
In this blog post, as I develop a non-ML baseline for image typeface classification, I use the OpenCV library to calculate the aspect ratio (width/height) of each letter in a text image.
Author

Vishal Bakshi

Published

August 14, 2024

Background

In this notebook, I’ll walk through a modified algorithm (suggested by Claude) to calculate the aspect ratio of letters in a text image. The aspect ratio of text in this case corresponds to the ratio of the width to height of a letter. Strictly speaking, in typography, the aspect ratio is defined as the ratio of the letter height to the x-height (lowercase letter height) of the font. That is not the definition I’m using here. Instead, I mean aspect ratio of a rectangle (width:height) that bounds a letter.

This algorithm is part of my exploration of non-ML baselines to classify text images into various typeface categories (e.g., “humanist sans,” “grotesque sans,” “script,” and “display”). Once the non-ML baseline is established, I’ll train a neural network for this task. This is one of many notebooks in my TypefaceClassifier project series.

Show imports
import cv2
import numpy as np, pandas as pd

Loading the Data

The first image I’ll use is that of a display typeface (the font Bree) with a font-size of 76px.

path = 'display-76px.png'
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
img
ndarray (512, 512) 
array([[255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       ...,
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255]], dtype=uint8)

The next step is to convert the image to binary data (1=white pixels, 0 = black pixels).

_, binary = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
binary
ndarray (512, 512) 
array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

Finding Letter Contours

Next, we get the contours which are defined in the OpenCV docs as:

a curve joining all the continuous points (along the boundary), having same color or intensity

It makes sense why we converted the image to binary data since that makes the letters all have the same color (white pixels or 1). The contours are outlines of the letters:

contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(img, contours, -1, (0,255,0), 3)
ndarray (512, 512) 
array([[255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       ...,
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255]], dtype=uint8)

Calculating the Aspect Ratio for Each Letter

Finally, we can calculate the aspect ratio of a letter by creating a bounding box around it. The width and height of the bounding box are used to calculate the aspect ratio \(\frac{\text{width}}{\text{height}}\)

x, y, w, h = cv2.boundingRect(contours[0]) # position, width and height of bounding box
cv2.rectangle(img,(x,y),(x+w,y+h),(0,255,0),2) # bounding box around "u" in "consequat"
ndarray (512, 512) 
array([[255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       ...,
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255],
       [255, 255, 255, ..., 255, 255, 255]], dtype=uint8)
# aspect ratio of the "u" in "consequat"
w, h, w/h
(33, 40, 0.825)

Calculating the Average Aspect Ratio for All Letters

I can put the above contour-to-bounding-box code in a loop over all contours and calculate the average aspect ratio of the letters in the image:

aspect_ratios = []
for contour in contours:
  x, y, w, h = cv2.boundingRect(contour)
  if w > 5 and h > 5:  # filter out punctuation or noise
    aspect_ratios.append(w / h)
np.mean(aspect_ratios), np.median(aspect_ratios)
(0.7164317414293024, 0.7804878048780488)

Calculating Aspect Ratio for Different Images

I’ll wrap that code in a function and apply it to different images of the same font and different fonts.

Show the aspect ratio function
def aspect_ratio(path):
    # Read the image
    img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)

    # Threshold the image
    _, binary = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    # Find contours
    contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    aspect_ratios = []

    for contour in contours:
        x, y, w, h = cv2.boundingRect(contour)
        if w > 5 and h > 5:  # Filter out very small contours
            aspect_ratios.append(w / h)

    averages = np.mean(aspect_ratios), np.median(aspect_ratios) if aspect_ratios else 0
    return averages
Show aspect ratios calcs
aspect_ratios = []
font_szs = [8, 18, 24, 36, 76, 240, 330, 420]

for typeface in ['display', 'serif']:
  for sz in font_szs:
    aspect_ratios.append((typeface, sz, *aspect_ratio(f"{typeface}-{sz}px.png")))
# Create DataFrame
df = pd.DataFrame(aspect_ratios, columns=['typeface', 'font_size', 'mean_ratio', 'median_ratio'])

The median aspect ratio becomes an outlier at very small (8px) and very large (240px) font sizes.

df
typeface font_size mean_ratio median_ratio
0 display 8 1.715923 1.500000
1 display 18 0.815149 0.800000
2 display 24 0.745288 0.769231
3 display 36 0.711960 0.750000
4 display 76 0.716432 0.780488
5 display 240 1.684231 1.263158
6 display 330 0.685398 0.823529
7 display 420 0.829781 0.829781
8 serif 8 2.473861 2.401786
9 serif 18 1.084338 1.000000
10 serif 24 0.919979 0.909091
11 serif 36 0.838929 0.833333
12 serif 76 0.901583 0.861111
13 serif 240 0.864006 0.850877
14 serif 330 0.920587 0.759848
15 serif 420 2.277634 0.909091

The median ratio is also more consistent (smaller standard deviation) than the mean ratio (which makes sense as its more robust).

df.groupby('typeface')[['mean_ratio', 'median_ratio']].describe().T
typeface display serif
mean_ratio count 8.000000 8.000000
mean 0.988020 1.285115
std 0.442387 0.679100
min 0.685398 0.838929
25% 0.715314 0.892189
50% 0.780219 0.920283
75% 1.043393 1.382662
max 1.715923 2.473861
median_ratio count 8.000000 8.000000
mean 0.939523 1.065642
std 0.281336 0.544326
min 0.750000 0.759848
25% 0.777674 0.846491
50% 0.811765 0.885101
75% 0.938125 0.931818
max 1.500000 2.401786

A tighter font size range (18px to 76 px) yields a more stable median aspect ratio. Here you can see that the display text consistently has narrower letters than the serif text.

df.query("font_size >= 18 and font_size <= 76").groupby('typeface')['median_ratio'].describe().T
typeface display serif
count 4.000000 4.000000
mean 0.774930 0.900884
std 0.020924 0.073112
min 0.750000 0.833333
25% 0.764423 0.854167
50% 0.774859 0.885101
75% 0.785366 0.931818
max 0.800000 1.000000

Final Thoughts

The non-ML side of computer vision—an area I’m quite new to—continues to surprise me with algorithms that fit surprisingly well into my niche use case: classifying typefaces from text images. Just like with the x-height to cap-height ratio algorithm, calculating aspect ratios works best within a specific range of font sizes, as very small or large sizes can cause issues like cropped text or blurry binarized images. I still have several more algorithms to explore as I work toward building a (likely multi-pronged) non-ML baseline for this classification task, and I’ll be covering each in future blog posts.

I hope you enjoyed this blog post! Follow me on Twitter @vishal_learner.