Stop-the-World: When GC Runs, Everything Freezes

The Scene: Latency Spikes With No Errors

A backend service was running fine — p50 latency at 12ms, p99 at 80ms. Then every 30-90 seconds:

p99 latency spiked to 800ms+
Health checks failed intermittently
The load balancer marked instances as unhealthy
No errors. No exceptions. No CPU spike.

The app just froze for a few hundred milliseconds and then resumed like nothing happened. No log entry. No stack trace. The culprit? Garbage Collection pauses.

Let's understand what's actually going on.

What Is Garbage Collection?

When your code creates an object — a string, a list, a request handler — it takes up memory. In languages like C, you have to free that memory yourself. Miss one? Memory leak. Free it twice? Crash.

Garbage Collection automates this. The GC periodically scans memory, finds objects your code can no longer reach, and frees them. You write code, it cleans up after you.

The problem comes in step 4. To find and reclaim dead objects, many GC implementations need to pause all your application threads. This is called a Stop-the-World (STW) pause. During this pause, your application is completely frozen — no requests processed, no responses sent, no timers fired. Nothing.

How Memory Is Organized: The Heap

All dynamically allocated objects live in a region of memory called the heap. Most modern GC implementations divide the heap into areas based on object age — this is called generational garbage collection.

Why generations? Because of one powerful observation: most objects die young.

Think about it. A request handler creates a DTO, serializes a response, builds a few strings. All of those are garbage within milliseconds. Only a few things — caches, connection pools, singletons — live for the lifetime of the application.

How GC Actually Works: Mark and Sweep

The most common GC algorithm is Mark-and-Sweep. It works in three phases:

Mark — starting from known "root" references (global variables, the stack, CPU registers), walk every reference chain and mark every reachable object as "alive"
Sweep — scan the entire heap and free any object that wasn't marked (it's garbage)
Compact (optional) — slide surviving objects together to eliminate fragmentation, updating all pointers

Minor GC vs Major GC

Not all GC pauses are equal. The pain depends on which generation is being collected.

Minor GC — Fast and Frequent

When the young generation (nursery) fills up, the runtime triggers a minor GC. It copies the few surviving objects to the survivor space and wipes the nursery clean. Since most objects are already dead, this is very fast.

Major GC (Full GC) — The Pause That Hurts

When the old generation fills up, the runtime triggers a full GC. This is the expensive one — it walks the entire old generation, marks every reachable object, sweeps the dead ones, and may compact memory. The bigger your heap, the longer this takes.

A 200ms GC pause on a service handling 5,000 requests/sec = 1,000 requests frozen at once. Those requests either wait (adding 200ms to latency), time out, or cascade (callers retry, creating even more load).

What Triggers a Full GC?

Full GC doesn't just happen randomly. These are the common triggers, regardless of language:

The Cascade: How One GC Pause Kills a Cluster

GC pauses don't just affect one request — they can cascade across your entire infrastructure.

How to Detect GC Problems

Before you can fix GC pauses, you need to see them. The symptoms have a distinctive fingerprint:

Universal Fixes (Any Language)

Regardless of what language you're using, these principles reduce GC pressure:

Reduce allocations in hot paths — every object you create is future garbage. Reuse buffers, avoid unnecessary intermediate objects, pre-allocate collections with known sizes.
Fix memory leaks — if your heap keeps growing after each GC cycle, objects are reachable but no longer needed. Common culprits: unbounded caches, forgotten event listeners, closures capturing large scopes.
Right-size the heap — too small means constant GC. Too large means catastrophic pauses when full GC eventually runs. Rule of thumb: 3-4x your live data set.
Monitor GC in production — every language has GC logging/metrics. Enable them. You can't fix what you can't see.
Avoid explicit GC calls — System.gc(), runtime.GC(), global.gc() — these force a full collection at the worst possible time.

How Different Languages Handle GC

Every managed language has a garbage collector, but they make very different tradeoffs. Here's how the major ones compare:

Key Takeaways

GC pauses are real outages. During a Stop-the-World pause, your app serves zero requests. Health checks fail, timeouts fire, cascading failures begin.
Most objects die young. That's why generational GC exists — young gen collection is fast, old gen collection is painful. Keep objects short-lived whenever possible.
Full GC should be rare. If you're seeing frequent full GCs, you have a heap sizing problem, a memory leak, or both.
Size the heap at 3-4x your live data set. Too small = constant GC. Too large = catastrophic pauses when full GC finally runs.
Fix the code before tuning the runtime. Unbounded caches, forgotten event listeners, closures capturing large scopes — these create garbage faster than any collector can clean up.
Always monitor GC in production. Enable GC logging. Watch for the sawtooth memory pattern. Track p99 spikes that correlate with GC events.
The best garbage is garbage never created. Reuse buffers, pre-allocate collections, avoid intermediate allocations in hot paths. The fastest GC cycle is the one that never runs.