Experimenting with `os.fork`

python

fastai

In this blog post I work through four examples provided by Claude to understand some key concepts related to os.fork, and observe different behaviors when using os.fork in a notebook environment or the shell.

Author

Vishal Bakshi

Published

September 30, 2024

Background

In Lesson 10 of the fastai course (Part 2) we’re introduced to os.fork, specifically in the context of random number generation. In this notebook I’ll get some more reps working with os.fork.

In the Lesson, Jeremy shows how random number generation in different libraries is handled across parent and child processes, as shown below (using seed and rand as defined in the lesson):

import os
import random
import numpy as np
import torch
import fcntl
import time
import signal
import sys

rnd_state = None
def seed(a):
    global rnd_state
    a, x = divmod(a, 30268)
    a, y = divmod(a, 30306)
    a, z = divmod(a, 30322)
    rnd_state = int(x)+1, int(y)+1, int(z)+1

seed(457428938475)
rnd_state

(4976, 20238, 499)

def rand():
    global rnd_state
    x, y, z = rnd_state
    x = (171 * x) % 30269
    y = (172 * y) % 30307
    z = (170 * z) % 30323
    rnd_state = x,y,z
    return (x/30269 + y/30307 + z/30323) % 1.0

The from-scratch rand function generates the same random number in both parent and child processes because they share the same random state:

if os.fork(): print(f'In parent: {rand(), rnd_state}')
else:
    print(f'In child: {rand(), rnd_state}')
    os._exit(os.EX_OK)

In parent: (0.7645251082582081, (3364, 25938, 24184))
In child: (0.7645251082582081, (3364, 25938, 24184))

torch does the same:

if os.fork(): print(f'In parent: {torch.rand(1).item(), torch.get_rng_state().sum().item()}')
else:
    print(f'In child: {torch.rand(1).item(), torch.get_rng_state().sum().item()}')
    os._exit(os.EX_OK)

In parent: (0.0692816972732544, 325580)
In child: (0.0692816972732544, 325580)

As does NumPy:

if os.fork(): print(f'In parent: {np.random.rand(1)[0], np.random.get_state()[1].sum()}')
else:
    print(f'In child: {np.random.rand(1)[0], np.random.get_state()[1].sum()}')
    os._exit(os.EX_OK)

In child: (0.8234897720205184, 1375830894290)
In parent: (0.8234897720205184, 1375830894290)

The Python standard library generates different random numbers in the parent and the child, indicating that the random state has changed:

if os.fork(): print(f'In parent: {random.random(), sum(random.getstate()[1])}')
else:
    print(f'In child: {random.random(), sum(random.getstate()[1])}')
    os._exit(os.EX_OK)

In parent: (0.7978973512537335, 1327601590235)
In child: (0.5603922565589059, 1333438682830)

Jeremy also mentioned in the video that there used to be a bug in fastai related to this os.fork behavior which resulted in incorrectly handling data augmentations across multiple processes. I poked around the fastai repo and found this issue and corresponding PR which might have been the ones he was referring to? I’m not sure, but it did lead me down an interesting rabbit hole in the fastai repo and I learned a couple of new things that I’ll share.

In the PR, they introduce the following line:

self.store = threading.local()

self.store is reference throughout the PR, for example:

def set_state(self):
        self.store.rand_r = random.uniform(0, 1)
        self.store.rand_c = random.uniform(0, 1)

The corresponding GitHub issue linked to this StackOverflow post which talks about threading.local(). I didn’t quite follow the post so I copy/pasted its text as a prompt to Claude and asked it to create an example to illustrate the core concepts of threading.local. It gave me the following example:

import threading
import multiprocessing
import time
import random

First, threading.local is instantiated as a global variable:

# Thread-local storage for threading module
thread_local = threading.local()

Next, we have a function that creates a worker. Claude defines a worker as follows (I found similar definitions with Google searches):

a unit of execution that performs a specific task or job. In the context of concurrent programming, a worker is typically implemented as either a thread or a process, depending on the chosen concurrency model.

threading_worker adds a count attribute to thread_local (if it doesn’t have it already) or increments count by 1 if it exists.

def threading_worker(worker_id):
    if not hasattr(thread_local, 'count'):
        print(f'\n\tWorker {worker_id}: instantiating `count`')
        thread_local.count = 0
    thread_local.count += 1
    print(f"Threading: Worker {worker_id}, Count: {thread_local.count}\n")
    time.sleep(random.random())

To illustrate, we create 5 threads and pass threading_worker to each one. The result is that each worker has its own “private view” to the global thread_local, as exhibited by thread_local.count for each worker_id having the same value of 1.

Finally, Claude explains that the purpose of thread.join() is to complete the action in the thread before returning to the main thread. Note that the final print statement, print("Threading example finished.") is run after all threads finish executing.

def run_threading_example():
    threads = []
    for i in range(5):
        thread = threading.Thread(target=threading_worker, args=(i,))
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

    print("Threading example finished.")

It’s interesting to note that each Worker instantiates count before adding 1 to it (as expected), but the order of each thread instantiating count (0, 1, 2, 3, 4) is not the same order of each thread adding 1 (0, 1, 3, 4, 2; which I didn’t expect).

run_threading_example()


    Worker 0: instantiating `count`
Threading: Worker 0, Count: 1


    Worker 1: instantiating `count`
Threading: Worker 1, Count: 1


    Worker 2: instantiating `count`

    Worker 3: instantiating `count`
Threading: Worker 3, Count: 1


    Worker 4: instantiating `count`
Threading: Worker 4, Count: 1

Threading: Worker 2, Count: 1

Threading example finished.

There is much to learn when it comes to threading and multiprocessing, but I’ll exit this rabbit hole for now.

The second thing I learned was this clever way to index into a tuple using a boolean expression:

@property
def multi_processing_context(self): return (None,multiprocessing)[self.num_workers>0]

I commented about this on Twitter and Jeremy replied:

Alternatively you can use a if pred else b btw. (Most people seem to hate both options ;) )
— Jeremy Howard (@jeremyphoward) September 28, 2024

Years back when I getting into web development, one of the patterns in JavaScript I enjoyed was the ternary operator:

a = is_true ? val_if_true : val_if_false

From what I understand, Python doesn’t have such an operator so anytime I come across a concise way to execute logic using a boolean expression, I’m excited to see it.

With that short interlude out of the way, I’ll now dig in to os.fork.

`os.fork` Experiments

I prompted Claude to give me some examples using os.fork with the following prompt:

I want to better understand what os.fork does. what’s a good set of experiments I can run to understand it’s functionality?

Claude with responded with four experiments, which I’ll run through next.

Basic `os.fork()` example

I’ll start with a definition from the “fork” Wikipedia page:

In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.

In multitasking operating systems, processes (running programs) need a way to create new processes, e.g. to run other programs. Fork and its variants are typically the only way of doing so in Unix-like systems. For a process to start the execution of a different program, it first forks to create a copy of itself. Then, the copy, called the “child process”, calls the exec system call to overlay itself with the other program: it ceases execution of its former program in favor of the other.

Next, I’ll look at the definition of os.getpid from the docs before using it:

Return the parent’s process id. When the parent process has exited, on Unix the id returned is the one of the init process (1), on Windows it is still the same id, which may be already reused by another process.

print(f"Main process PID: {os.getpid()}")

Main process PID: 436

Next, I’ll call os.fork:

Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised.

Note that some platforms including FreeBSD <= 6.3 and Cygwin have known issues when using fork() from a thread.

if os.fork(): print(f'In parent: {os.getpid()}')
else:
    print(f'In child: {os.getpid()}')
    os._exit(os.EX_OK)

In parent: 436
In child: 580

It’s important to note that I took the above code straight from Lesson’s 10’s 01_matmul.ipynb.

When I tried to run the following in Colab, the cell wouldn’t execute and would just hang:

pid = os.fork()

When I tried to run that locally on my MacBook, I got the following error:

OSError: [Errno 9] Bad file descriptor

I found this StackOverflow post which talks about similar issues, and that os.fork doesn’t play nice with Jupyter Notebooks. Claude also seemed to agree, recommending that I either use the os._exit approach from Lesson 10, or put my os.fork-related code in a separate .py script outside the notebook.

I asked Claude to rewrite the os.fork experiments using that if/else approach.

When I run the following code block, it’s interesting to note that the child process runs before the parent process. I wonder if that means os.fork returneed 0? Claude says no:

The reason it might seem like the child process runs first is due to how process scheduling works in operating systems. When os.fork() is called, both the parent and child processes are ready to run, and the operating system’s scheduler decides which one to execute first. In this case, the child process got scheduled to run before the parent continued.

It adds the following context:

This behavior - where the child might run before the parent continues - is normal and expected in multi-process programming. It’s one of the reasons why synchronization mechanisms are often needed when working with multiple processes.

print(f"\nMain process PID: {os.getpid()}")

if os.fork():
    print(f"\nIn parent: {os.getpid()}")
else:
    print(f"\nIn child: {os.getpid()}, Parent PID: {os.getppid()}")
    os._exit(os.EX_OK)

print(f"\nThis will be printed only by the parent process. PID: {os.getpid()}")


Main process PID: 436

In child: 853, Parent PID: 436
Main process PID: 436

In parent: 436

This will be printed only by the parent process. PID: 436

Memory Independence Example

The following example illustrates how “forked processes have independent memory spaces and that changes to variables in one process don’t affect the other process” as Claude states it.

The global shared_variable maintains its global value of 0 in the child process, before 1 is added to it to give it a final value of 1 in the child process. Meanwhile, in the parent process, it’s final value is 2. This reminds me of the threading.local behavior.

shared_variable = 0

if os.fork():
    # Parent process
    shared_variable +=  2
    print(f"\nIn parent: {os.getpid()}, shared_variable = {shared_variable}")
else:
    # Child process
    shared_variable += 1
    print(f"\nIn child: {os.getpid()}, shared_variable = {shared_variable}")
    os._exit(os.EX_OK)

print(f"Final shared_variable in parent: {shared_variable}")


In parent: 436, shared_variable = 2
Final shared_variable in parent: 2

In child: 902, shared_variable = 1

File Descriptor Inheritance

Claude then provided the following code to illustrate how to write to the same file different data from the parent and child process. However, this code resulted in only the parent writing to the file:

with open("test.txt", "w") as f:
    if os.fork():
        # parent process
        f.write("Written by parent\n")
    else:
        # child process
        f.write("Written by child\n")
        os._exit(os.EX_OK)

# Run this after the script to see the contents:
print(open("test.txt", "r").read())

Written by parent

Claude then suggested using “file locking” and “flushing” to ensure the writing happens before process execution has ended, but this didn’t help. Sometimes it wrote from both processes, sometimes just from one. I’ve illustrated both examples below:

def do_write():
  with open("test.txt", "w") as f:
      if os.fork():
          # parent process
          fcntl.flock(f, fcntl.LOCK_EX)
          f.write("Written by parent\n")
          f.flush()
          fcntl.flock(f, fcntl.LOCK_UN)
      else:
          # child process
          fcntl.flock(f, fcntl.LOCK_EX)
          f.write("Written by child\n")
          f.flush()
          fcntl.flock(f, fcntl.LOCK_UN)
          os._exit(os.EX_OK)

  # Run this after the script to see the contents:
  print(open("test.txt", "r").read())

do_write()

Written by parent

do_write()

Written by child
Written by parent

I wanted something deterministic so I prompted Claude again. It responded with the following solution where “the child writes first and then signals the parent”. A couple of things to note:

The child sends a SIGUSR1 signal to the parent pid. (SIGUSR1 stands for “User-defined signal 1”)
Inside parent_process, the file is opened in “append mode”.

def child_process(parent_pid):
  time.sleep(0.1)  # Small delay to ensure parent is waiting
  with open("test.txt", "w") as f:
    f.write("Written by child\n")
    f.flush()
  os.kill(parent_pid, signal.SIGUSR1) # this is where the child sends a signal to the parent
  os._exit(os.EX_OK)

def parent_process(signum, frame):
  with open("test.txt", "a") as f: # notice the "a" for "append mode"
      f.write("Written by parent\n")
      f.flush()

def do_write2():
  signal.signal(signal.SIGUSR1, parent_process)

  parent_pid = os.getpid()

  if os.fork() == 0:
      child_process(parent_pid)
  else:
      signal.pause()  # Wait for signal from child

  # Read and print the file contents
  with open("test.txt", "r") as f:
    res = f.read()
  return res

This works as expected! At least for the 1000 times that I ran it:

for _ in range(1000):
  res = do_write2()
  assert res == 'Written by child\nWritten by parent\n'

I noticed that parent_process is passed signum and frame. I asked Claude to define these:

signum: This parameter represents the signal number that was caught. In this case, it will be signal.SIGUSR1, which is the signal sent by the child process to the parent. The signum allows the signal handler to identify which signal triggered it, which can be useful if the same handler is used for multiple signals. frame: This parameter is a frame object representing the stack frame of the interrupted code when the signal was received. It contains information about the program’s execution state at the time the signal was caught, such as the current line number and local variables.

I’ll print out signum and frame to see what they look like here:

def parent_process(signum, frame):
  print(signum, frame)
  with open("test.txt", "a") as f: # notice the "a" for "append mode"
      f.write("Written by parent\n")
      f.flush()

signum has a value of 10 and frame has the additional information as Claude described.

do_write2()

10 <frame at 0x56005af01c30, file '<ipython-input-56-0f16beee5172>', line 22, code do_write2>

'Written by child\nWritten by parent\n'

Exit Status

Claude describes the following code as a way to illustrate how “the parent can wait for the child to finish and retrieve its exit status.” I added a couple of print statements to see more clearly that the parent process waits for the child process to exit.

Claude describes the -1 in os.waitpid(-1, 0) as follows:

When -1 is used as the first argument to os.waitpid(), it tells the function to wait for any child process to terminate.

The 0 in os.waitpid(-1, 0) is explained in the docs:

The semantics of the call are affected by the value of the integer options, which should be 0 for normal operation.

def do_exit():
    if os.fork():
        # Parent process
        print("Parent waiting...")
        child_pid, status = os.waitpid(-1, 0)
        print("Parent done waiting!")
        print(f"In parent: {os.getpid()}")
        print(f"Child process (PID {child_pid}) exited with status {os.WEXITSTATUS(status)}")
    else:
        # Child process
        print(f"In child: {os.getpid()}, exiting with status 5")
        os._exit(5)  # Use os._exit to avoid affecting the notebook process

    print(f"This will be printed only by the parent process. PID: {os.getpid()}")

However, when I run do_exit, based on the child pid’s shown, it creates two different child processes (4475 and 4448):

do_exit()

In child: 4475, exiting with status 5Parent waiting...
Parent done waiting!
In parent: 436
Child process (PID 4448) exited with status 5
This will be printed only by the parent process. PID: 436

And note that do_exit print statements don’t always run in that order, indicating that the child process is not running first even though we have used waitpid:

do_exit()

Parent waiting...
Parent done waiting!
In parent: 436
Child process (PID 902) exited with status 0
This will be printed only by the parent process. PID: 436
In child: 1249, exiting with status 5

When I put that code into a .py file and run it from the shell, it behaves as expected (there is only one child process created, 5221, and it runs first while the parent process waits):

!python3 do_exit.py

Parent waiting...
In child: 5221, exiting with status 5
Parent done waiting!
In parent: 5216
Child process (PID 5221) exited with status 5
This will be printed only by the parent process. PID: 5216

Final Thoughts

Working with os.fork was tougher than I expected. I assumed it would be plug-and-play, but I encountered non-deterministic behavior, which seems to be common when working with multiple processes.

I also learned that os.fork behaves (or misbehaves) differently when running inside a notebook cell compared to running in the shell. For instance, executing pid = os.fork in a notebook cell causes the execution to hang when trying to return the child’s process ID, or spawns multiple child processes when using the if os_fork:/else: pattern.

There are some ways to make os.fork behave in a notebook environment, as we saw when synchronizing work between the child and parent by having the child signal the parent before both wrote to the same file.

Another key concept I observed was memory independence— even in a notebook environment, the parent and child processes have their own private access to global variables, allowing you to assign different values to the same variable in each process.

Future work: I want to run a similar set of experiments with the multiprocessing library, as I see it used more often (for example, in the fastai repo).

I hope you enjoyed this blog post. Follow me on Twitter @vishal_learner.