Calculating the Ratio of Letter Perimeter to Area

python

computer vision

TypefaceClassifier

In this blog post I use the OpenCV library to calculate the ratio of letter (contour) perimeter to area. The serif font has consistently larger ratios than the sans serif font.

Author

Vishal Bakshi

Published

September 6, 2024

Background

In this notebook I’ll walk through an algorithm suggested by Claude to distinguish one typeface (like display) from another (like serif) in which we calculate the ratio of the perimeter to area of each letter. This algorithm is relatively simple (utilizing the power of the OpenCV library).

This algorithm is part of my exploration of non-ML baselines to classify text images into various typeface categories (e.g., “humanist sans,” “grotesque sans,” “script,” “display,” etc.). Once the non-ML baseline is established, I’ll train a neural network for this task. This is one of many notebooks in my TypefaceClassifier project series.

import cv2
import numpy as np
import pandas as pd
from google.colab.patches import cv2_imshow

Load and Binarize the Image

As usual, we’ll load the image and binarize it so the text is white and the background is black.

path = 'serif-76px.png'
img = cv2.imread(path, cv2.IMREAD_GRAYSCALE)
_, binary = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
binary

ndarray (512, 512)

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)

Calculate the Ratio of Contour Perimeter to Area

Next, we calculate the contours in the image

contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

I’ll visualize the contours to show what we’re dealing with. As you can see—the contours are essentially the letter boundaries.

contour_image = np.zeros((binary.shape[0], binary.shape[1], 3), dtype=np.uint8)
cv2.drawContours(contour_image, contours, -1, (0, 255, 0), 2)
cv2_imshow(contour_image)

We then calculate the total perimeter and total area of all contours:

total_perimeter = sum(cv2.arcLength(contour, True) for contour in contours)
total_area = sum(cv2.contourArea(contour) for contour in contours)

And take the ratio of the two:

ratio = total_perimeter / total_area if total_area > 0 else 0
ratio

0.3750295759392109

Calculating Contour Ratio for Multiple Images

I’ll wrap the above functionality (except for the contour visualization) into a function and calculate the ratio for different images of different typefaces.

def contour_ratio(image_path):
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    _, binary = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)

    contours, _ = cv2.findContours(binary, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

    total_perimeter = sum(cv2.arcLength(contour, True) for contour in contours)
    total_area = sum(cv2.contourArea(contour) for contour in contours)

    ratio = total_perimeter / total_area if total_area > 0 else 0

    return ratio

On average, images with serif fonts have a higher contour ratio (perimeter:area) than images with display fonts. This matches my intuition: serif fonts have more detailed elements (the serifs) which increase the perimeter of the shape for a given area.

szs = [8, 18, 24, 36, 76, 240, 330, 420]
ts = ['display', 'serif']
res = []

for t in ts:
    for sz in szs:
        image_path = f"{t}-{sz}px.png"
        sr = contour_ratio(image_path)
        res.append([t, sz, sr])

res = pd.DataFrame(res, columns=['typeface', 'font-size', 'contour-ratio'])
res.groupby('typeface')['contour-ratio'].agg(['mean', 'median'])

	mean	median
typeface
display	0.577296	0.345773
serif	0.781032	0.548321

For all font sizes (except 72px) this trend is evident: the serif texts have a larger perimeter:area ratio than the sans serif display texts.

res.sort_values(by='font-size')

	typeface	font-size	contour-ratio
0	display	8	2.050135
8	serif	8	2.218431
1	display	18	0.992742
9	serif	18	1.530445
2	display	24	0.730834
10	serif	24	1.106150
3	display	36	0.459297
11	serif	36	0.721613
4	display	76	0.232250
12	serif	76	0.375030
5	display	240	0.074272
13	serif	240	0.115825
6	display	330	0.046465
14	serif	330	0.084438
7	display	420	0.032374
15	serif	420	0.096324

Final Thoughts

Among the algorithms I’ve tested, this one demonstrates the highest consistency in differentiating typefaces, regardless of font size, making it a great candidate for a non-ML typeface classification baseline.

I’m also a huge fan of simplicity and this is one of the simplest algorithms for this task that I have implemented. A win-win!

I hope you enjoyed this blog post. Follow me on Twitter @vishal_learner.