In the previous post, I covered how the OS talks to hardware — controllers, interrupts, syscalls, strace, POSIX. Now I want to get into the part that I found most mind-bending when I was learning it: Unix processes.
Specifically: fork(), exec(), wait(), exit(), zombies, orphans, and how all of this connects into the lifecycle of every process running on your machine. Let’s get into it.
Processes — The Core Abstraction
A process is a program that’s running. Your .c file sitting on disk is just a file. Once you compile and execute it, it becomes a process — with its own memory, its own PID, its own place in the process table.
The kernel tracks each process through a PCB (Process Control Block), called task_struct in Linux. It stores the PID, process state, time slice, parent pointer, list of children, open file descriptors, memory map — everything the OS needs to manage that process.
Important distinction: the task_struct is not the process itself. It’s an abstraction that represents the process inside the kernel. The process is the actual execution happening on the CPU — code running, memory being accessed, registers being used. The task_struct is the bookkeeping that lets the OS manage it.
The process hierarchy
Every process has exactly one parent. A parent can have many children. This forms a tree, and at the very top sits init (PID=1) — created by the kernel at boot, never terminates, ancestor of every other process in the system.
init (PID=1) → systemd services → terminal → shell → your program
When you open a terminal and type ls, the shell (parent) creates a child process that runs ls. That’s the hierarchy in action.
Identifying processes
getpid() → returns YOUR process ID (always a positive number)
getppid() → returns your PARENT's process ID (also always positive)
These never return 0 or negative values. They’re just an ID card — they tell you who you are. Keep that in mind, it matters later.
fork() — Cloning a Process
fork() is the syscall that creates a new process. It makes a copy of the calling process. The most confusing (and important) thing about fork():
It’s called once but returns twice — once in the parent and once in the child. Both continue from the exact same point in the code, but with different return values.
| Return value | Meaning |
|---|---|
0 | You are the child process |
> 0 (child’s PID) | You are the parent process |
< 0 | fork() failed (too many processes, no swap space, etc.) |
Think of it like a cloning machine. You walk in and press the button. Two people walk out. The original gets a paper that says “your clone is #2022”. The clone gets a paper that says “0” — meaning “I’m the clone”.
The child gets 0 not because its memory is wiped or zeroed — it’s a deliberate choice by the kernel. During fork(), the kernel copies the parent’s registers into the child’s context, but sets the return value register (eax/rax) to 0 for the child and to the child’s PID for the parent. It’s a design convention: the child can always find its parent with getppid(), but the parent has no simple “get child PID” function (a parent can have many children), so it receives the child’s PID as the return value. PID 0 is reserved for the kernel’s idle/swapper process, so it works cleanly as an indicator.
Basic example
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
int main() {
pid_t pid = fork();
if (pid == 0) {
printf("I'm the child! PID: %d, Parent: %d\n", getpid(), getppid());
exit(0);
} else {
printf("I'm the parent! PID: %d, Child: %d\n", getpid(), pid);
wait(NULL);
}
return 0;
}
A mistake I made on my first try
When I first tried fork(), I wrote something like this:
int pid = getpid();
if (pid != 0) { // THIS IS ALWAYS TRUE
fork();
}
The problem: getpid() always returns a positive number — it’s your real PID. So pid != 0 is true for everyone, including children. The children keep forking, their children keep forking, and you get an explosion of processes.
The fix: use the return value of fork() to distinguish parent from child, not getpid().
Copy-on-Write (CoW)
Copying the entire memory of a process on every fork() would be insanely expensive. Linux is smarter: parent and child share the same memory pages initially, marked as read-only. Only when one of them tries to write to a page does the OS create a separate copy of that specific page.
It’s like two roommates sharing a notebook. As long as nobody scribbles in it, they both read from the same one. The moment someone wants to write, they photocopy that page and each keeps their own version.
After fork(), parent and child are logically isolated — one can never see the other’s changes. But physically, they share the same pages until a write happens. The isolation is guaranteed by the CoW mechanism: when either process writes, the kernel intercepts the page fault, copies the page, and gives each process its own version.
File Descriptors — Everything Is a File
Before we get to exec(), we need to understand file descriptors, because they’re the key to understanding why Unix separates fork() and exec() into two operations.
What is a file descriptor?
A file descriptor is just a number. When your process opens something — a file, a terminal, a network socket — the kernel assigns it the smallest available integer. That number is how your process refers to that resource from then on.
Every process starts with three file descriptors already open:
0→ stdin (your keyboard)1→ stdout (your screen)2→ stderr (also your screen, for errors)
The kernel maintains a file descriptor table for each process (stored in its task_struct). It’s essentially an array mapping numbers to resources:
Process's fd table:
fd 0 → /dev/pts/0 (terminal - keyboard input)
fd 1 → /dev/pts/0 (terminal - screen output)
fd 2 → /dev/pts/0 (terminal - error output)
The syscalls
open() asks the kernel to add an entry. The kernel finds the smallest free number and returns it:
int fd = open("data.txt", O_RDONLY); // O_RDONLY = open for reading only
// kernel adds: fd 3 → data.txt
// returns 3
close() removes an entry, freeing that number:
close(3);
// fd 3 → (free)
write(fd, data, count) sends bytes somewhere. Three arguments: where (fd number), what (the bytes), how many:
write(1, "hello", 5);
// kernel looks up: fd 1 → /dev/pts/0 (terminal)
// sends "hello" to the terminal
read(fd, buffer, count) receives bytes. Same structure: from where, put them where, how many at most:
char buffer[100];
read(3, buffer, 100);
// kernel looks up: fd 3 → data.txt
// reads up to 100 bytes from data.txt into buffer
The process doesn’t know or care what’s on the other end. It just says “write these 5 bytes to fd 1” and the kernel resolves the rest.
”Everything is a file”
This is the Unix philosophy. Your terminal? It’s /dev/pts/0. Your hard disk? /dev/sda. Your webcam? /dev/video0. System info? /proc/cpuinfo. They’re all accessed with the same interface: open, read, write, close.
When ls wants to print output, it calls write(1, "file.txt\n", 9). The kernel looks up fd 1, sees it points to /dev/pts/0, and sends the bytes through the pseudo-terminal driver, which delivers them to your terminal emulator, which renders them on screen.
Why this matters for fork + exec
This is why Unix splits process creation into fork() and exec() instead of having a single “create process and run program” call. The gap between fork() and exec() is where the shell configures the child’s environment — especially file descriptors.
When you type ls > output.txt, here’s what actually happens:
- Shell calls
fork()→ child is a copy of the shell - In the child, before
exec(): close fd 1, openoutput.txt→ it gets fd 1 (smallest free number) - Child calls
exec("ls")→lsloads, inherits the fd table as-is lscallswrite(1, ...)thinking it’s writing to the terminal — but fd 1 now points to the file
if (pid == 0) {
// child: redirect stdout to file
close(1); // close terminal
open("output.txt", O_WRONLY | O_CREAT); // gets fd 1
exec("ls"); // ls inherits this fd table
}
The ls program doesn’t know and doesn’t care that fd 1 changed. It always does write(1, ...). That’s the power of this design — any program works with redirection without being modified.
The critical detail: exec() replaces the program (code, memory, stack), but preserves the file descriptor table. If it didn’t, the entire redirection mechanism would break.
Pipes work the same way. When you type ls | grep txt, the shell creates a pipe (two connected fds), forks twice, connects fd 1 of ls to fd 0 of grep, and neither program knows they’re talking to each other. ls writes to fd 1, grep reads from fd 0, both think they’re using the terminal.
exit() and wait() — Death and Cleanup
exit() — process termination
exit(n) kills the process immediately. It stops executing, frees memory, closes files, and sends the exit code n to its parent. By convention, 0 means success and anything else means error.
Why it’s critical: without exit(0) after the child’s code, the child doesn’t die. It keeps executing whatever comes next — including the parent’s loop. That’s how you get an avalanche of unintended processes.
wait(NULL) — waiting for a child to die
wait(NULL) blocks the calling process until exactly one direct child terminates. Not grandchildren, not all children — one child, per call.
It’s like a traffic light: the process stops, waits for a child to die, then continues. It doesn’t kill the loop or end the program — it just pauses.
Practical implications
// Option 1: sequential
for (int i = 0; i < 2; i++) {
pid_t pid = fork();
if (pid == 0) {
// child does work
exit(0); // EXIT here, otherwise it continues the loop
}
wait(NULL); // parent waits before creating the next one
}
// Option 2: parallel
for (int i = 0; i < 2; i++) {
pid_t pid = fork();
if (pid == 0) {
// child does work
exit(0);
}
}
wait(NULL); // wait for first child
wait(NULL); // wait for second child
Both work. The difference: option 1 runs children sequentially (create, wait, create, wait). Option 2 creates all children first, then waits for all of them — they run in parallel.
The async/await analogy
If you’ve worked with C# and ASP.NET, wait(NULL) is conceptually similar to await. Both mean “stop here until something finishes”. The difference: await in C# is async — it frees the thread to handle other requests while the task completes. wait() in C is synchronous — it blocks the entire process, period. No thread pool, no reuse.
But from the perspective of that specific execution flow, they’re both sequential — the code below only runs after the awaited thing finishes.
Orphans and Zombies
Two edge cases that show up constantly in OS discussions:
Orphan process
The parent dies before the child. The child is still running but has no parent. In Unix, init (PID=1) immediately adopts it. The child keeps running normally — it just has a new parent.
Zombie process
The child dies before the parent calls wait(). The child has finished executing, but its entry in the process table is still there — because the OS needs to keep the exit code until the parent asks for it. The child is dead but not cleaned up. That’s a zombie.
Calling wait() removes the zombie from the table. If the parent dies without ever calling wait(), init adopts the zombie and calls wait() for it.
| Situation | What happens | Name |
|---|---|---|
| Parent dies first | init (PID=1) adopts the child | Orphan |
| Child dies first | Entry stays in table until parent calls wait() | Zombie |
Parent calls wait() | Child’s table entry is removed | Normal cleanup |
Zombies in production — real cases
This isn’t theoretical. Zombie accumulation has caused real outages in real projects:
- ClickHouse (PR #71301): zombie processes accumulated after library bridge crashes because
waitpidwasn’t called correctly after child termination. - RediSearch (Issue #8009): fork-GC child processes became zombies and weren’t reaped by redis-server, accumulating over time until hitting the PID limit.
- OpenAI Codex (Issue #4726): when Codex terminated a child process via
start_kill(), it never calledwait()— creating zombies that held file descriptors and prevented directory cleanup. - incron (Issue #22): every program execution triggered by a filesystem event left a zombie. The fix was a single line:
waitpid(pid, &status, 0).
The pattern is always the same: someone did fork() and forgot wait(). It works fine for hours or days, then the PID table fills up and the system stops creating new processes. The most common fixes are either calling wait()/waitpid() explicitly, or telling the kernel to auto-reap children with signal(SIGCHLD, SIG_IGN).
exec() — Becoming a Different Program
fork() creates a clone. But what if the child needs to run a completely different program? That’s what exec() does.
exec() replaces the current process’s code, data, heap, and stack with a new program. The PID stays the same — it’s not a new process, it’s a transformation. The old program is gone forever.
Critically: exec() preserves the file descriptor table. This is not an accident — it’s the entire reason fork + exec works as a design. The shell can configure redirections and pipes between fork() and exec(), and the new program inherits them without knowing.
The classic Unix pattern
This is how your shell runs every command you type:
- Shell calls
fork()→ creates a child (clone of the shell) - Child calls
exec("ls")→ child becomes thelsprogram - Shell calls
wait()→ waits forlsto finish lscallsexit()→lsterminates, shell continues
This fork + exec + wait + exit cycle is the foundation of everything in Unix.
pid_t pid = fork();
if (pid == 0) {
// I'm the child — become 'ls'
execlp("/bin/ls", "ls", "-la", NULL);
// If exec succeeds, this line NEVER runs
// because the child is now a different program
printf("This won't print\n");
} else {
wait(NULL);
printf("Child finished\n");
}
Putting It All Together
Here’s a practical example: create a process that spawns 2 children, and each child spawns 2 grandchildren. Every process prints its PID and its parent’s PID.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
int main() {
for (int i = 0; i < 2; i++) {
int child_pid = fork();
if (child_pid == 0) {
// I'm a child — create 2 grandchildren
for (int j = 0; j < 2; j++) {
int grandson_pid = fork();
if (grandson_pid == 0) {
printf("Grandchild PID: %d, Parent: %d\n",
getpid(), getppid());
exit(0); // grandchild dies here
}
wait(NULL); // wait for THIS grandchild
}
printf("Child PID: %d, Parent: %d\n", getpid(), getppid());
exit(0); // child dies here
}
wait(NULL); // parent waits for THIS child
}
return 0;
}
The process tree:
Parent
├── Child 1
│ ├── Grandchild 1
│ └── Grandchild 2
└── Child 2
├── Grandchild 3
└── Grandchild 4
Key things to notice:
exit(0)is essential — without it, children don’t die and keep looping, creating unintended processes.wait(NULL)inside the loop — creates one child per iteration and waits for it before creating the next (sequential).- Each
wait()handles one child — if you create 2 children outside a wait-loop, you need 2 separatewait()calls.
Why This Actually Matters — Writing Better Software
This isn’t just OS theory. These concepts show up directly in real system design decisions. Here are the questions you’ll be able to answer once you actually understand this.
Why you must kill the child with exit()
Because without exit(), the child keeps executing the parent’s code after fork(). Imagine a web server that forks for each request. Without exit() in the child, it processes the request, finishes the handler, and then continues in the main loop — accepting new connections, forking again, over and over. You end up with processes that shouldn’t exist competing for resources with the original process.
// BUGGY web server - missing exit()
int server_fd = setup_server(8080);
while (1) {
int client_fd = accept(server_fd, NULL, NULL);
pid_t pid = fork();
if (pid == 0) {
handle_request(client_fd);
close(client_fd);
// BUG: no exit() here!
// child goes back to the top of the while loop
// now it's ALSO accepting connections and forking
}
close(client_fd);
}
With just 3 requests, you’d have the parent, plus children, plus grandchildren all competing for the same socket. It’s an accidental fork bomb.
In practice: any code that uses fork() without exit() in the child is wrong. Simple as that.
Why you must wait for the child with wait()
Because without wait(), the child becomes a zombie. And zombies accumulate.
Each zombie entry in the process table holds a PID. The process table has a limit (typically 32768 PIDs on Linux). If you have a server that forks for each request and never calls wait(), eventually the table fills up and the system stops creating new processes. fork() starts returning errors. Your server stops working.
This happens in production. It’s one of the most classic bugs in Unix servers written in C.
Why you can’t have zombies
The zombie itself doesn’t consume real CPU or memory — it’s already dead. The problem is the PID. Each zombie holds a PID slot, and you have a finite number of them.
But there’s a second problem: a zombie means the parent isn’t calling wait(). That means either the parent died and init hasn’t cleaned up yet, or — more concerning — the parent is alive but ignoring its children. In high-load servers, that’s a resource leak — the same way a malloc() without free() is a memory leak. Each fork() allocates a resource (a process table entry and a PID), and wait() is what releases it. Without wait(), each fork leaks a little, and under load, those leaks accumulate into an outage.
You can check this with:
ps aux | grep 'Z' # list processes in zombie state
If you see zombies accumulating, the problem isn’t the zombie — it’s the parent that’s not calling wait().
fork() + CoW and the impact on servers
Copy-on-Write has a concrete practical implication: fork() is cheap as long as the child doesn’t write. This is exactly why the fork + exec pattern is efficient — the child calls exec() almost immediately, and the kernel discards the shared pages without ever copying anything.
fork()
│ CoW: child points to parent's pages (no copy)
│
close(1), open("output.txt")
│ (only touches kernel's fd table, not memory pages)
│ (CoW still not triggered, zero copies)
│
exec("ls")
│ discards all shared pages, loads ls binary
│ (parent's pages were never copied)
Without CoW, fork() would copy the parent’s entire memory, and exec() would throw it all away immediately. With CoW, that cost simply doesn’t exist.
But if the child starts writing to many pages before calling exec() (or if it’s not going to call exec() at all), the copy cost shows up. This is where things get interesting in interpreted languages.
CoW and garbage collectors — the Instagram problem
Python/Ruby workers that fork() without exec() can have severe memory issues. Instagram ran into this at scale: their Django workers were forked from a master process, and memory usage grew linearly — about 600MB extra per worker after 3,000 requests.
The root cause was subtle: Python’s garbage collector periodically scans all objects in memory and updates internal counters (gc_refs, ob_refcnt). Those updates count as writes, which trigger CoW — even though the child process was only reading data. Instagram called this “copy-on-read”.
After fork (CoW, memory shared):
Parent → heap pages (real, 500MB)
Worker → points to parent's pages (0MB extra)
GC runs in worker, touches every object:
Parent → heap pages (original, 500MB)
Worker → heap pages (copied, 500MB)
Total: 1GB instead of 500MB
Their solution was gc.freeze(), added to Python 3.7. It tells the garbage collector to ignore all existing objects — effectively hiding them from GC scans. The GC in child processes only tracks new objects, leaving inherited pages untouched and shared.
# In parent, before fork:
gc.disable() # stop GC
gc.freeze() # hide all existing objects from GC
os.fork() # child inherits pages via CoW
# In child:
gc.enable() # GC only tracks NEW objects created by this child
Ruby had the same problem: in versions before 2.0, the GC mark phase wrote a flag directly into each object’s header, triggering CoW on every page. Ruby 2.0 fixed it by moving the mark bits to a separate bitmap structure, so GC passes no longer touched the objects themselves.
This problem is specific to the prefork model — servers like Gunicorn (Python), Unicorn (Ruby), and PHP-FPM that use fork() to create workers. Servers that use threads or async I/O (Kestrel in .NET, Node.js, Go’s goroutines, Nginx) don’t fork for concurrency, so CoW and GC interactions don’t apply.
Why shells are fast — the fork+exec pattern in practice
When you type ls in the terminal, the shell doesn’t load the ls binary into itself. It calls fork() — creating a copy of itself in microseconds (thanks to CoW) — and then the child calls exec("/bin/ls"). The kernel discards everything from the child and loads ls in its place.
The parent (shell) calls wait() and blocks. When ls finishes with exit(), the shell unblocks, prints the prompt, and is ready for the next input. The entire cycle takes milliseconds.
This model is what makes terminals so responsive. There’s no overhead of loading a new runtime or initializing a process from scratch — you clone a process that’s already warm and swap its program.
What this tells you about Python’s multiprocessing
Python generates 16.5x more syscalls than C for the same “Hello World”. That matters when you’re running serverless functions where each invocation pays for the cold start.
But there’s a more specific point: Python’s fork() is problematic in multiprocessing contexts because the interpreter has global state (GIL, memory allocator, internal threads). If you fork() after the interpreter has already initialized those things, the child inherits an inconsistent state.
The problem is that fork() in POSIX only replicates the thread that called it. All other threads in the parent simply don’t exist in the child. But the mutexes those threads were holding are copied as-is — locked. In the child, no thread will ever unlock them. It’s a permanent deadlock.
Parent (before fork):
Thread 1 (main) → calls fork()
Thread 2 (GC) → holding mutex_gc (locked)
Thread 3 (allocator) → holding mutex_alloc (locked)
Child (after fork):
Thread 1 (main) → only surviving thread
mutex_gc → locked (Thread 2 doesn't exist here)
mutex_alloc → locked (Thread 3 doesn't exist here)
→ deadlock when child tries to allocate memory or trigger GC
This is exactly why Python 3.12 changed the default from fork to forkserver in multiprocessing. The forkserver is a separate process created via fork() + exec() early in the program, before any threads or locks exist. When you need a worker, the main process asks the forkserver to fork() one. Since the forkserver was created clean (the exec() discarded the parent’s state and started a fresh interpreter), its children are born in a consistent state — no orphaned mutexes, no deadlocks.
The tradeoff: forkserver children don’t inherit the parent’s memory. If the parent loaded a 500MB dataset, workers need to receive that data explicitly (via shared memory, pipes, etc.). It’s the classic safety vs. performance tradeoff. Python offers three modes:
- fork: fast, shares everything via CoW, but can deadlock
- forkserver: one
exec()upfront, then cheap forks, but loses parent state - spawn:
fork() + exec()for every worker — safest but slowest
This problem is inherent to languages with heavy runtimes. In C, there’s no interpreter, no GIL, no GC threads — fork() just works. In Python, Ruby, or Java, you’re not just forking your code — you’re forking the entire runtime with all its internal machinery. That’s the cost of abstraction.
Understanding fork + exec + wait + exit gives you the vocabulary to understand why that decision was made.
Key Takeaways
-
Layered abstraction is the organizing principle. Controller → Driver → OS → Application. Each layer only knows the interfaces of its neighbors.
-
Interrupts, not polling, enabled multiprogramming. With interrupts, the CPU and I/O devices work in parallel. The CPU sleeps while the device works and gets notified when it’s done.
-
Syscalls and interrupts are complementary. The syscall is the request (synchronous, inside-out). The interrupt is the completion notification (asynchronous, outside-in). A single
read()involves both. -
fork + exec + wait + exitis the Unix process lifecycle. Every command you run in the terminal goes through this cycle. The shell forks, the child execs, the shell waits, the child exits. -
File descriptors and “everything is a file” are what make fork+exec powerful. The gap between
fork()andexec()is where the shell configures redirections and pipes.exec()preserves the fd table, so any program works with redirection without modification. -
Copy-on-Write makes
fork()efficient. Memory is shared until someone writes. Especially important when the child is about toexec()anyway. But garbage collectors in interpreted languages can accidentally trigger CoW by writing to objects during scans. -
The cost of abstraction is measurable. C: 34 syscalls. Python: 562 syscalls. Same output.
stracemakes this visible. Andfork()in interpreted languages carries the weight of the entire runtime — GIL, GC threads, memory allocator — which can lead to deadlocks, memory bloat, or both.