How Redis Actually Deletes Expired Keys — The Lazy + Active Hybrid You Didn't Know About

TL;DR

Redis does not delete a key the moment its TTL hits zero. It uses a hybrid of two strategies: lazy (delete on next access) and active (a background sampler that runs ~10×/sec).
The active sampler picks 20 random keys with TTL, deletes the expired ones, and if more than 25% were expired, repeats. Probabilistic, bounded, cheap.
Why a hybrid? Lazy alone leaves dead keys forever (memory bloat). Active alone has to scan the whole keyspace (CPU bomb). Together they're tunable and predictable.
If you're memory-pressured, maxmemory + eviction policy is the actual safety net — TTL is a hint, not a guarantee of timely deletion.
For replicas: expiration happens on the primary; replicas only delete when they see a DEL in the replication stream. Old keys can linger on a replica longer than you think.

Here's a thing that surprises people the first time their Redis bill goes up: setting a TTL on a key does not mean the key disappears at TTL=0. Redis is more pragmatic than that. The key stays in memory until either (a) someone accesses it and gets told "yeah, this is gone" while Redis silently deletes it, or (b) a background sampler happens to roll the dice on it. This blog walks through the two strategies Redis uses, why the hybrid exists, the math behind the 25% threshold, what happens on replicas and in cluster mode, and the handful of tuning knobs that actually matter.

Why this is even a question

The naïve mental model is "Redis stores a TTL with each key, runs a timer, deletes the key on expiry." That sounds clean. It's also a terrible idea at any meaningful scale.

Imagine 50 million keys, each with its own TTL. The timer-per-key model means 50 million scheduled events fighting for kernel timers, or one giant priority queue that you have to walk on every tick. CPU goes up, latency goes up, the event loop blocks. Redis is single-threaded for command processing — the one thing you absolutely cannot do is spend the whole tick scanning expiry metadata.

So the design choice was: don't try to be precise about when a key dies. Be precise about what reads return, and clean up in the background at a rate you can control. That's the whole insight. Everything else is mechanism.

The contract Redis actually gives you: "If you ask for an expired key, you'll never see its value." It does not promise "the key is removed from memory at TTL=0." Two different guarantees, one of them much cheaper to implement.

Strategy 1 — Lazy (passive) expiration

Every command that touches a key does an expiry check first. GET foo, HGET user:42 name, EXISTS session:abc — all of these go through the same lookup path. The lookup checks: does this key exist? If yes, is its TTL expired? If expired, delete it now, return as if it never existed. Otherwise, serve the value.

Lazy is essentially free — the expiry check is just an integer comparison against the stored expire-at timestamp. The cost only kicks in if a key actually is expired, in which case Redis deletes it inline (or asynchronously via UNLINK-style lazy free, depending on the operation type and config).

The catch is the obvious one: if a key never gets accessed again, lazy expiration never runs on it. A user who sets one-million session keys with 30-day TTL and then half the users churn? Half a million dead keys, sitting in RAM, taking up bytes that Redis thinks are still "valid usage" — until something touches them. Without a second mechanism, this is a slow-burning memory leak.

Strategy 2 — Active (proactive) expiration

This is the part that surprises people. Redis runs a background task — at 10 Hz by default, configurable via hz — that proactively scans for expired keys. It's not a full scan. That would be insane. It's a random sampler with a feedback loop:

The algorithm in plain words:

every 1000/hz milliseconds (default 100ms):
  start_time = now()
  loop:
    sample 20 random keys from the "keys with TTL" set
    expired_count = 0
    for each sampled key:
      if expired:
        delete it
        expired_count += 1

    if expired_count / 20 <= 25%:
      break              # mostly clean, stop for now
    if elapsed(start_time) > cycle_time_budget:
      break              # ran out of time, yield to the event loop
    # else: keep sampling — there are probably more dead keys

Why these specific numbers? The 25% threshold is the steady-state guarantee. If, statistically, the random sample shows fewer than 25% expired keys, Redis concludes "the proportion of dead keys with TTL in the keyspace is currently below ~25%, my work here is done for this tick." If it's above 25%, there's enough garbage to justify another sample-and-delete iteration. The feedback loop self-tunes to the workload — quiet keyspaces use almost no CPU; bursty TTL workloads automatically run more iterations to catch up.

"Keys with TTL" matters here. Redis maintains a separate dictionary of just the keys that have expirations set. Active expiration samples from that set, not from the entire keyspace. So a database with 100 million keys but only 1 million TTL'd keys runs the sampler against the 1 million — much cheaper.

Why both? The hybrid is the whole point

Either strategy alone fails:

Lazy alone: if a key is never accessed after expiry, it sits in RAM until the heat death of the universe. For workloads where many keys are written-then-forgotten (sessions, idempotency tokens, rate-limit counters, one-time tokens), this is a memory leak.
Active alone: would have to be aggressive enough to keep RAM clean, which means scanning a lot. CPU bomb on big keyspaces, especially when most keys aren't even close to expired.

The hybrid solves both:

Lazy handles the hot keys — they get touched anyway, free expiry checks come along for the ride.
Active handles the cold dead keys — the ones nobody is asking for. The probabilistic sampler keeps the dead-key fraction bounded without ever needing a full scan.

The hz parameter — what it actually controls

The hz config option controls how many times per second Redis runs its background tasks. Default is 10. Range is roughly 1–500. It's not just expiration — hz also drives client timeout checks, cluster bus tasks, and a few other periodic things — but expiration is the most visible one.

hz value	Effect	When to use
1–5	Less CPU spent on background tasks. More dead keys lingering. Slower client timeout detection.	Tiny instances, very cold keyspaces, no TTLs.
10 (default)	Sweet spot. Active expiration runs every 100ms, ~25% dead-key bound holds in steady state.	Default for almost everyone. Don't touch unless you have a reason.
50–100	Active sampler runs more often. Tighter dead-key bound. More CPU spent in background tasks.	TTL-heavy workloads (sessions, rate limits) where memory bloat from cold keys is hurting you.
100+	Diminishing returns. Background CPU competes with command execution. Latency p99 can creep up.	Rare. Measure before cranking it.

Redis 6+ added active-expire-effort as a separate dial (1 to 10) that scales the active-expire cycle's aggressiveness independently of the global hz. If you're trying to be tighter on dead-key memory but don't want to make every other periodic task run faster, this is the better lever.

DEL vs UNLINK — what "delete" actually means under the hood

Once Redis has decided a key is expired, the next question is how to free it. For a SET foo bar with a 5-byte value, deletion is a memcpy and a free — instantaneous. For a HSET user:1 ... with a million fields, freeing the underlying hash table is potentially milliseconds of CPU. Doing that synchronously on the event loop blocks every other client.

This is what UNLINK (Redis 4.0+) is for. UNLINK removes the key from the keyspace dictionary immediately (so reads no longer see it), but the actual memory free is queued to a background thread. The event loop unblocks instantly; the heavy reclamation runs off-thread.

For active expiration of big keys, Redis 4.0+ gates this behind lazyfree-lazy-expire. Default in modern versions is yes — meaning expirations of big collections use UNLINK-style lazy free automatically. Before that flag, a single huge expired hash could pause your Redis for tens of milliseconds. After it, you don't notice.

If you're on a 3.x or early 4.x Redis and you have collection keys with large element counts under TTL — upgrade or set the lazy-free flags. A single active-expire cycle hitting a multi-million-element set can cause a visible event-loop pause. This used to be the most common "Redis is slow tonight" incident.

Replicas — the part that breaks people's mental model

This is the one that usually shows up in postmortems. Replicas do not run their own active expiration on the user-facing keyspace (in the default configuration). The reasoning: if both the primary and replica independently decide to expire keys, they'll diverge — different sample order, different deletion order, the replica's view drifts from the primary's. So Redis's design is:

The primary runs lazy + active expiration as described.
When the primary deletes an expired key, it propagates a synthetic DEL to all replicas via the replication stream.
The replica applies the DEL just like any other write — that's when the key actually disappears from the replica's memory.

Two practical consequences:

The replica's used_memory can be higher than the primary's for short windows. The primary has already DEL'd the key locally; the replica is still waiting for the replication packet (or processing a backlog). Don't panic if the numbers don't match exactly.
Reads on a replica may see "expired-looking" keys if you're reading raw — but modern Redis applies an expiry check on lookup even on replicas, so client-visible reads won't return expired values. The memory footprint can lag, the user-visible answer will not.

Cluster mode — same mechanism, sharded

In Redis Cluster, each node owns a subset of hash slots. Active expiration runs per node, on the keys that node is the primary for. Replicas of those slots follow the same DEL-propagation rule above. There's no global expiration coordinator — that would be a scalability disaster. Each node sweeps its own slots, period.

Practical implication: your "dead key" memory pressure is per-node, not cluster-wide. A skewed workload that puts most TTL'd keys on one slot will mostly tax that one node's expiration sampler.

Where TTL stops being enough — maxmemory + eviction

Here's the subtle thing senior engineers get bitten by: TTL is not a memory budget. It's a hint about when a value becomes invalid. If your write rate exceeds your expiration rate (lazy + active combined), memory grows. When memory hits maxmemory, Redis stops doing TTL expiration as the primary cleanup mechanism and switches to eviction based on your configured policy.

maxmemory-policy	What it does	When to use
`noeviction`	Reject writes with OOM error. No keys deleted.	Source of truth, never want silent data loss.
`allkeys-lru`	Evict least-recently-used across all keys.	Pure cache. TTLs are bonus, not load-bearing.
`allkeys-lfu`	Evict least-frequently-used across all keys.	Cache where access frequency matters more than recency.
`volatile-lru`	Evict LRU only among keys with TTL set.	Mixed workload: persistent + expiring keys, want to protect persistent.
`volatile-ttl`	Evict the key with the soonest TTL first.	You want "things closer to dying go first." Subtle pitfalls — measure.
`volatile-random`	Random victim from the TTL set.	Rarely the right answer. Cheap, predictable bad behavior.

The classic gotcha: you set TTLs on every key, set maxmemory-policy noeviction "to be safe," and then your traffic doubles. Active expiration can't keep up, memory hits the cap, writes start failing with OOM errors, and the dashboard shows you have plenty of "expired" keys still in RAM. The keys had TTL. Redis just hadn't gotten around to deleting them yet. Your cache was, in effect, a wall.

What you actually monitor and tune

Metric (from `INFO`)	What it tells you	What to do
`expired_keys`	Total keys deleted by expiration since startup.	Watch the rate; sudden drops can signal active-expire stalling.
`used_memory` vs `used_memory_dataset`	Total RAM vs the part holding actual data.	Big gap = fragmentation or dead-key buildup. Investigate.
`db0:keys` vs `db0:expires`	Total keys, and how many have a TTL set.	Sudden growth in `expires` with flat `expired_keys` = sampler is behind.
Replication lag	Replica is processing DELs late.	Replicas hold expired-but-not-yet-DEL'd keys longer; expect transient memory difference.
Slow log entries	Big DEL or expire-time freeing causing slowdowns.	Confirm `lazyfree-lazy-expire yes`. Check for unusually large keys.

The handful of config flags that matter:

# redis.conf — the expiration-relevant knobs

hz 10                               # default. crank to 50–100 for TTL-heavy workloads.
                                    # affects all background tasks, not just expiration.

active-expire-effort 1              # Redis 6+. range 1–10. higher = tighter dead-key bound,
                                    # more CPU. independent of hz.

lazyfree-lazy-expire yes            # async free for expired big keys. should be ON in modern
                                    # configs. without it, big-collection expiry blocks the loop.

lazyfree-lazy-eviction yes          # same, but for eviction (when maxmemory hits).

lazyfree-lazy-server-del yes        # async free for explicit DEL of big keys.

maxmemory 8gb                       # the actual safety net. set this.
maxmemory-policy allkeys-lru        # what happens when you hit it. pick deliberately.

Closing take

The mental model worth keeping is this: a Redis TTL is a contract about when reads start returning nil. It is not a contract about when memory is freed. The lazy + active hybrid makes the trade-off explicit — Redis spends almost no CPU on expiration in steady state, and accepts a bounded amount of memory overhead from dead keys that haven't been swept yet.

Where this matters: when you're sizing memory, when you're tuning hz for TTL-heavy workloads, when you're reading replicas and the numbers don't match, when you're staring at a "Redis is full but most of these keys should be expired" incident at 2 AM. Every one of those gets clearer once you internalize that the deletion is opportunistic, not punctual.

Set the right maxmemory, pick an eviction policy that matches the role (cache vs source-of-truth), enable lazy-free for big collections, and let the sampler do its job. That's the whole pattern.

If you remember one thing: a TTL of 60 seconds doesn't mean the key is gone in 60 seconds. It means nobody will see the value after 60 seconds. The actual memory free happens lazily, on access or sample, whichever comes first. Build your capacity model on that, not on the wall clock.