How Databases and Servers Shut Down Without Losing Your Data — Signal Handling, Drain, and the 30-Second Clock

Tags: Signals · SIGTERM · Graceful Shutdown · PostgreSQL · Redis · Kubernetes · POSIX · Reading time: ~20 min · Category: System Design → Architecture Decisions

Every program you have ever run will be told to stop. The only question is whether it gets a polite request it can act on — or a bullet it never sees coming.

"Graceful shutdown" sounds like a nicety, a bit of operational polish. It is not. It is the difference between a deploy that rolls cleanly and one that drops half-finished payments, corrupts a write-ahead log, or leaves a thousand users staring at a spinner. And almost all of it — in your database, your web server, your container orchestrator — comes down to one small, old, beautifully simple Unix mechanism: signals.

This post builds the whole picture from the bottom up: what a signal actually is, the two fundamentally different ways a process can die, the five-stage drain that every well-behaved server runs, how PostgreSQL and Redis and a plain HTTP server each do it, how Kubernetes wires it all together with a countdown that ends in a SIGKILL, and the handful of mistakes that turn "graceful" back into "power cut."

The two ways a process dies

Start here, because everything else hangs off it. A Unix process can be terminated in two categorically different ways, and the difference is the entire subject of this post.

The first is a request. The kernel delivers a signal — almost always SIGTERM — that says "please terminate." The process has installed a signal handler: a function the kernel runs when that signal arrives. The handler gets to do work first. It can close listeners, finish in-flight requests, flush buffers to disk, and then exit on its own terms. This is the graceful path.

The second is an execution. The kernel delivers SIGKILL (signal 9). There is no handler, because SIGKILL cannot be caught, blocked, or ignored — the kernel does not even give the process the courtesy of running code. It simply removes the process from the run queue, reclaims its memory, closes its file descriptors, and it is gone. Anything that was only in RAM — an un-flushed write buffer, a half-built response, an un-acked message — vanishes.

This is why kill -9 is the move of last resort. kill -9 <pid> sends SIGKILL; plain kill <pid> sends SIGTERM. When someone tells you "just kill -9 it," they are telling you to skip every safety mechanism the program's authors built. Sometimes that is correct — the process is wedged and ignoring SIGTERM. But it is never free.

Graceful shutdown is the art of doing everything that matters in the window between "please stop" and "stop now."

A quick signals primer

A signal is the oldest IPC mechanism in Unix: a small integer the kernel delivers asynchronously to a process to tell it something happened. There are a few dozen of them. For shutdown, a handful matter:

Signal	Num	Default action	Catchable?	Typical meaning
`SIGTERM`	15	Terminate	Yes	"Please shut down." The polite default of `kill`, Docker, and Kubernetes.
`SIGINT`	2	Terminate	Yes	Ctrl-C in a terminal. Interactive "stop."
`SIGHUP`	1	Terminate	Yes	Terminal hang-up. Re-purposed by daemons to mean "reload config."
`SIGQUIT`	3	Terminate + core dump	Yes	Ctrl-\. Often wired to a harder/faster shutdown.
`SIGKILL`	9	Terminate	No	"Die now." Cannot be caught, blocked, or ignored.
`SIGSTOP`	19	Stop (pause)	No	Freeze the process. Also uncatchable.

When a signal arrives, the kernel interrupts whatever the process was doing and runs the handler the process registered for that signal — or, if none was registered, performs the default action from the table above. The default action for SIGTERM is "terminate," which is why a program that installs no handler still dies on SIGTERM — it just dies abruptly instead of gracefully.

Async-signal-safety. A signal handler interrupts the program at an arbitrary instruction — possibly in the middle of malloc, or while holding a lock. So a handler may only call async-signal-safe functions; calling printf, allocating memory, or taking a mutex from inside a handler can deadlock or corrupt state. This is the reason real systems keep handlers tiny: the handler does almost nothing except record "a shutdown was requested," and the actual draining happens in normal program flow. We'll see that pattern in the code section.

The graceful-shutdown lifecycle

Strip away the language and the product, and every well-behaved server runs the same five stages, in this order. The order is not arbitrary — each stage exists to make the next one safe.

Catch the signal. The handler does the bare minimum — flip an atomic "shutting down" flag, or write to a self-pipe / cancel a context — and returns. Real work happens in the main flow.
Stop accepting new work. Close the listening socket (or stop pulling from the queue). New connections get refused fast so a load balancer routes them elsewhere. Crucially, this bounds the set of in-flight work you then have to drain — if you skip it, stage 3 chases a moving target.
Drain in-flight work. Let the requests/queries/jobs that already started run to completion, up to a deadline. This is where "graceful" earns its name.
Flush durable state. Force buffered writes to disk: a database checkpoint, an fsync, a final snapshot. Until this returns, your data is only a power-cut away from gone.
Close and exit 0. Release connections, remove the pidfile/socket, return exit code 0 so the supervisor knows it was a clean stop.

Hold this picture in your head. Everything below is a real system filling in these five boxes.

How databases do it

Databases are where graceful shutdown matters most, because the cost of getting it wrong is durability — the one promise a database exists to keep. Each of the big ones maps signals onto the lifecycle a little differently.

PostgreSQL — three shutdown modes, three signals

Postgres is the textbook example because it makes the trade-off explicit: it offers three shutdown modes, and the way you pick one is literally which signal you send the postmaster (the supervisor process).

Mode	Signal	`pg_ctl` flag	What it does
Smart	`SIGTERM`	`-m smart`	Stop accepting new connections; wait for every existing client to disconnect on its own; then checkpoint and exit. The gentlest — and potentially the slowest.
Fast (default)	`SIGINT`	`-m fast`	Refuse new connections, actively terminate existing sessions, roll back open transactions, run a checkpoint, exit cleanly. No data loss; doesn't wait for idle clients.
Immediate	`SIGQUIT`	`-m immediate`	Tell every process to quit now without a checkpoint. Next startup must run crash recovery from the WAL. No committed data lost, but recovery takes time.

Notice the elegance: the same five-stage lifecycle, with a dial for "how much do you wait for clients vs. how fast do you go." Smart and Fast both end in a checkpoint — Postgres flushes dirty pages from shared buffers to the data files and writes a checkpoint record to the write-ahead log, so the data on disk is fully consistent. Immediate skips the checkpoint entirely; it trusts the WAL to make the database whole again on restart.

Why no committed data is ever lost, even on Immediate. Every commit is durable in the WAL before it's acknowledged (that's the "write-ahead" in write-ahead log). Immediate shutdown skips the orderly checkpoint, but the WAL still holds every committed transaction. On the next boot, crash recovery replays the WAL forward from the last checkpoint and the database is consistent again. You pay in recovery time, not in lost commits.

Kubernetes sends Postgres a SIGTERM, which is Smart mode — and that is a classic trap. If even one client holds an idle connection open, Smart mode waits for it, the pod blows past its grace period, and Kubernetes SIGKILLs the postmaster mid-flight — the worst outcome. Production Postgres images therefore override the stop signal to SIGINT (Fast mode), so the database actively closes sessions and checkpoints within the grace window.

Redis — save the snapshot, then go

Redis lives in memory, so its shutdown question is sharp: do I persist before I die? When the server receives SIGTERM, it runs a function called prepareForShutdown() that:

Refuses to crash mid-write — if a background save is already running, it kills the child and starts fresh.
If save points are configured (RDB persistence), performs a final synchronous SAVE, writing the entire dataset to the .rdb file.
Removes the pidfile and unlinks the Unix socket.
Exits 0.

// redis src/server.c — the heart of the SIGTERM path (simplified)
int prepareForShutdown(int flags) {
    // ... cancel any in-progress background save ...
    if (server.saveparamslen > 0 && !(flags & SHUTDOWN_NOSAVE)) {
        // a save point is configured — flush the dataset to disk
        if (rdbSave(server.rdb_filename, NULL) != C_OK) {
            // can't save — refuse to exit and lose data, unless forced
            return C_ERR;
        }
    }
    unlinkUnixSocket();
    removePidFile();
    exit(0);
}

The catch. That final save only happens on SIGTERM. If Redis is killed by SIGKILL — or by the Linux OOM killer when the box runs out of memory — prepareForShutdown() never runs, and every write since the last background save is gone. This is why a Redis used as a source of truth must lean on AOF (append-only file) persistence with fsync, not just periodic RDB snapshots: AOF is durable continuously, not only at a graceful exit.

MySQL / InnoDB — the flush dial

MySQL catches SIGTERM and begins an orderly shutdown of the storage engines. For InnoDB, how thorough that shutdown is depends on one variable, innodb_fast_shutdown:

`innodb_fast_shutdown`	On shutdown	Trade-off
`0` (slow)	Full purge, change-buffer merge, and flush of every dirty page before exit.	Slowest stop; fastest, cleanest restart. Used before major upgrades.
`1` (default)	Skip the slow purge/merge; flush the redo log and dirty pages.	Balanced. No data loss; restart does a little recovery.
`2` (cold)	Flush logs and exit almost immediately, as if it crashed.	Fastest stop; restart runs full crash recovery from the redo log.

Same shape as Postgres: committed data is safe because the redo log is durable; the dial only chooses how much work happens at shutdown versus at the next startup.

How servers do it

Stateless servers don't have a WAL to protect, but they have something just as user-visible: requests in flight. A graceless restart turns a successful request into a connection reset that the user sees as an error.

Connection draining

The web-server version of the five stages is "drain the connections":

Catch SIGTERM.
Stop the accept loop — close the listening socket so no new connections are accepted. Existing keep-alive connections get Connection: close on their next response so clients reconnect elsewhere.
Wait for in-flight requests to finish, up to a deadline (the "drain timeout").
There's usually nothing to flush for a stateless server — but if it buffers logs or metrics, this is where they get pushed.
Exit 0.

The deadline in stage 3 is essential. Without it, one slow client (or a leaked long-poll) keeps the server alive forever and the deploy hangs. With it, you finish the requests you can and forcibly close the rest when the clock runs out — trading a few reset connections for a bounded shutdown.

Kubernetes — where it all comes together

Kubernetes is where signal handling stops being academic, because it is the thing sending the signal to almost every server and database you run today. When a pod is deleted (a deploy, a scale-down, a node drain), Kubernetes runs a specific, time-boxed sequence:

Endpoint removal. The pod is taken out of the Service's endpoint list, so load balancers and kube-proxy stop routing new traffic to it. (This is eventually-consistent — see the pitfall below.)
preStop hook (optional). A command or HTTP call that runs before the signal. Commonly a sleep 5 to give endpoint propagation time to catch up.
SIGTERM to PID 1 — the container's main process. Your handler runs the five-stage drain.
The grace clock — terminationGracePeriodSeconds, default 30 — ticks down.
SIGKILL if the process is still alive when the clock hits zero.

The one rule that makes all of this work: your application's worst-case drain-and-flush time must be shorter than terminationGracePeriodSeconds. If draining can take 45 seconds under load, a 30-second grace period guarantees a SIGKILL mid-drain on your busiest pods. Measure the real number; set the grace period above it with margin.

The code pattern

The recurring shape, in every language: the signal handler does almost nothing — it just signals intent — and the real shutdown runs in normal program flow where it's safe to allocate, lock, and do I/O.

In Go, the idiom is signal.NotifyContext, which cancels a context when the signal arrives, plus http.Server.Shutdown, which does the drain for you:

func main() {
    // ctx is cancelled when SIGINT or SIGTERM arrives.
    ctx, stop := signal.NotifyContext(context.Background(),
        syscall.SIGINT, syscall.SIGTERM)
    defer stop()

    srv := &http.Server{Addr: ":8080", Handler: mux}

    go func() {
        if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatal(err)
        }
    }()

    <-ctx.Done() // block until the signal flips the context
    log.Println("shutting down: draining connections...")

    // Stage 2+3: stop accepting, wait for in-flight up to a deadline.
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()
    if err := srv.Shutdown(shutdownCtx); err != nil {
        log.Printf("forced close after timeout: %v", err) // drain deadline hit
    }
    // Stage 4: flush whatever else is durable (db.Close(), flush metrics, ...).
}

The Node.js equivalent is the same lifecycle by hand — register a handler, stop accepting, wait for the server to drain, then exit:

const server = app.listen(8080);

function shutdown(signal) {
  console.log(`${signal} received — draining`);

  // Stage 2+3: stop accepting new connections; callback fires when
  // all in-flight requests have completed.
  server.close(async () => {
    await db.end();        // Stage 4: flush/close the pool
    process.exit(0);       // Stage 5
  });

  // Bound the drain — don't hang forever on a slow client.
  setTimeout(() => {
    console.error("drain timed out — forcing exit");
    process.exit(1);
  }, 25_000).unref();
}

process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT",  () => shutdown("SIGINT"));

Both share the non-negotiable: a bounded drain. The timeout is shorter than the orchestrator's grace period, so you decide how the unfinished work ends — not a SIGKILL.

The pitfalls that turn "graceful" back into "power cut"

Almost every graceful-shutdown bug is one of these five.

1. PID 1 in a container has no default signal disposition. The Linux kernel gives the init process (PID 1) special treatment: signals without an explicit handler are not applied by default. If your container's main process is PID 1 and you didn't install a SIGTERM handler, the signal is silently ignored — so the process always runs out the full grace period and dies by SIGKILL, every single deploy. Fixes: install a real handler, run with docker run --init / a tiny init like tini, or use exec in your entrypoint so your app actually becomes PID 1 with its handler attached (and not a shell that swallows the signal).

2. Doing real work inside the signal handler. Allocating memory, taking a lock, or doing blocking I/O from within the handler can deadlock, because the handler interrupted code that may already hold that lock. Keep the handler to an atomic flag / context cancel / self-pipe write, and drain in the main flow.

3. An unbounded drain. "Wait for all requests to finish" with no deadline means one stuck request hangs the shutdown until the orchestrator SIGKILLs you — converting a graceful stop into a hard kill anyway. Always cap the drain below the grace period.

4. The endpoint-removal race. In Kubernetes, "remove from endpoints" and "send SIGTERM" happen roughly in parallel and propagate at different speeds. A pod can receive SIGTERM and close its listener while a load balancer, not yet updated, is still sending it new requests — which now get connection-refused. The standard fix is a preStop: sleep 5 (or keep serving briefly after SIGTERM) so traffic stops arriving before you stop accepting it.

5. A grace period shorter than the real drain. The default 30 seconds is a guess, not a measurement. If your worst-case in-flight work — a long query, a big upload, a Redis SAVE of a large dataset — can exceed it, raise terminationGracePeriodSeconds to match reality. A grace period set below your real drain time is a scheduled data-loss event.

The lesson

SIGKILL is the kernel's job. SIGTERM is yours — and the gap between them is exactly as long as you make your countdown.

Every database checkpoint, every drained connection, every clean deploy is the same small trick: catch the polite signal, stop taking new work, finish what you started, get your bytes safely to disk, and exit on your own terms — all before the clock runs out. Get the order right and bound the wait, and "shutdown" stops being the scary part of an incident and becomes the most boring, reliable thing your system does.