Caching is not "add Redis" A layer-by-layer tour of the cache stack — CPU to CDN CPU L1/L2/L3 ~1–30 ns OS page cache ~100 ns DB buffer pool ~µs App in-process ~µs Redis / Memcached ~0.5–1 ms HTTP / CDN ~ms–10s of ms TLB virt → phys cache Browser cache disk + memory Reverse proxy Varnish / nginx DNS resolver recursive cache Query plan cache prepared stmts …and more at every hop Twelve places your data is cached before "Redis" is one of them. Each layer has a different speed, capacity, invalidation model, and failure mode. If the only tool in your cache toolbox is Redis, every problem looks like a key-value lookup.

Here's the conversation that happens too often. "The page is slow." "Add Redis." Six weeks later, the page is still slow, the Redis bill is real, and the actual bottleneck was a CPU cache miss on a 50-MB object you were memcpy'ing on every request. Caching is one of those words that means a dozen things and gets used as if it means one. This post walks the whole stack, from L1 cache to CDN, and tries to give a senior-engineer mental model of what each layer is actually for.

Why this matters: the latency budget tells the whole story

The numbers below are approximate, vary by hardware/network, and have been cribbed from a thousand "latency numbers every programmer should know" lists. Internalize the order of magnitude, not the exact figure.

OperationApprox. latencyWhat it tells you
CPU register / L1 cache hit~1 nsFree. You can do a billion of these per second per core.
L2 cache hit~3–5 nsStill essentially free.
L3 cache hit~10–30 nsCheap, but already 30× slower than L1.
Main memory (DRAM)~100 ns100× slower than L1. This is where cache-friendly code matters.
Branch misprediction~5–15 nsFree until you do millions per second.
OS page cache hit~100 ns – 1 µsReading a "hot" file is just a memcpy from kernel RAM.
SSD random read~50–150 µs500× slower than RAM. Every cache miss to disk hurts.
In-process map lookup (Java/Go)~50–500 nsClosest you get to L1 for application data.
Same-DC network round-trip~0.5–1 msThis is what Redis actually costs you.
Cross-region round-trip~30–100 msDon't put your cache here unless you mean it.
CDN edge → end user~5–30 msHard to beat for static / cacheable content.
The cheapest cache hit is the one that never leaves the CPU. The next cheapest is the one that never leaves the process. The next is the one that never leaves the host. Going to Redis is the fourth cheapest option, not the first.

Layer 1 — CPU caches (L1, L2, L3)

CPU cache hierarchy: where the actual fast caches live Main memory (DRAM) — ~100 ns, ~GBs L3 cache — shared across cores — ~10–30 ns, ~MBs L2 cache — per core — ~3–5 ns, ~256 KB–1 MB L1 — ~1 ns, ~32 KB Cache lines are typically 64 bytes — load one byte, get all 64 for free. Code that respects this is 5–10× faster.

Every CPU has a tiny pyramid of caches sitting between the cores and main memory. Reads and writes fetch and flush in cache lines — usually 64 bytes — so when you load one byte of an array, you actually get the next 63 for free. This is why iterating an array of structs is dramatically faster than iterating a linked list of nodes scattered across the heap. Same logical operation; cache-friendliness is the difference.

The kind of bug that bites at this layer:

  • False sharing. Two threads write to two different fields that happen to live in the same cache line. The cache-coherence protocol bounces the line between cores on every write. Throughput cratered, no lock visible. Padding the struct fixes it.
  • Pointer-chasing. Linked structures scattered across RAM force a fresh cache miss for every node. A flat array with the same data is often 10× faster despite "the same algorithm."
  • Working set bigger than L3. When your hot data exceeds the last-level cache, every iteration goes to DRAM. Sometimes the right cache fix is "make the data smaller," not "add a service."
Real-world version: a hot loop processing one field of a 200-byte struct. Switching to a struct-of-arrays layout (so the hot field is contiguous in memory) made the same code 4× faster. Zero new infrastructure, zero new services, zero new failure modes. That's a cache fix.

Layer 2 — TLB (Translation Lookaside Buffer)

Often forgotten. The TLB caches virtual-to-physical address translations. Every memory access goes through it. A TLB miss means walking the page table — which itself can miss in the data cache, which means a real DRAM read just to figure out where another DRAM read should go.

You don't tune the TLB directly, but you affect it. Workloads with huge working sets and small (4 KB) pages thrash the TLB. Huge pages (2 MB or 1 GB pages) drastically reduce TLB pressure for memory-heavy databases — which is why Postgres, Redis, and MongoDB all have docs about transparent huge pages. (Spoiler: Redis tells you to disable THP because of latency spikes, while Postgres can benefit. The right answer depends on the workload.)

Layer 3 — OS page cache

This is the layer most people forget exists. The kernel keeps recently-read file pages in RAM. When you read(fd, ...) a file, the kernel checks the page cache first; if the pages are there, it's just a memcpy from kernel memory to your buffer — no disk I/O.

OS page cache: free RAM is not "wasted" — it's caching your hot files app: read(fd, …) syscall kernel: page cache hit? indexed by inode + offset LRU-ish eviction HIT → memcpy from kernel RAM ~100 ns – 1 µs total MISS → disk read, then cache ~50 µs SSD, ~10 ms HDD "My app doesn't do disk caching" — yes it does, the kernel does it for you. SQLite, log readers, mmap'd files, static-asset servers — all riding the page cache. This is why a "warm" database server reads files in microseconds and a cold one stalls.

This is why a server that's been up for hours has microsecond reads on the same file that took milliseconds right after boot — the second read is a memcpy, the first was actual disk I/O. It's also why Linux happily shows "all RAM used" on a healthy server: the kernel filled it with cached file pages, and will free them the moment any process needs more.

The page cache is invisible to your app — you don't allocate it, you don't manage it, you just benefit from it. But it shapes performance dramatically:

  • Database files, log files, static assets — all benefit automatically.
  • Memory-mapped (mmap) files give you a pointer that's backed by the page cache. Read it like memory; the kernel pages from disk on demand.
  • If you do O_DIRECT I/O, you bypass the page cache entirely. Sometimes that's what you want (databases manage their own buffer pool); usually it's a footgun.

Layer 4 — Database buffer pool

Databases don't usually trust the OS page cache alone. They run their own internal cache of hot data pages — InnoDB calls it innodb_buffer_pool_size, Postgres calls it shared_buffers, SQL Server calls it the buffer pool. The DB knows things the kernel doesn't (which pages are part of the same B-tree, which indexes are hot, which queries are scanning what), so it caches with smarter eviction.

This is why every "Postgres is slow" investigation starts with: does your working set fit in shared_buffers? Same query against the same table, hot vs cold buffer pool, can be 100× different. The cache is doing its job; it just hasn't been warmed.

The two-cache problem: data lives both in the DB buffer pool and the OS page cache, which is duplication and can be wasteful. Most relational DBs default to "let both happen, they're both LRU-ish, RAM is cheap." Some embedded engines (RocksDB) prefer to manage their own block cache and discourage the OS from caching the same blocks twice.

Layer 5 — In-process application cache

This is the cache your code holds inside the same process. A HashMap<K,V> wrapped in something that evicts. Caffeine in Java, lru-cache in Node, functools.lru_cache in Python, Go's sync.Map with a custom eviction layer.

In-process cache: zero network, microsecond lookups, but lives and dies with the process Wins • ~100 ns – 1 µs lookup • zero network • zero serialization • no Redis bill • cache-friendly for hot configs, feature flags, lookup tables Costs • one cache per replica • cold restart = empty cache • no cross-host invalidation • memory eats heap → GC pressure • stale data possible per-host if you don't plan invalidation

This is the layer that's underused. People reach for Redis when a 1 MB lookup table that changes once a day could just be a Caffeine cache. Network round-trip vs in-process map lookup is roughly 1000× difference in latency.

The trade-off: every replica has its own cache, so invalidation across replicas requires either short TTLs, a pub/sub channel ("the data changed, drop your cache"), or a versioned key approach. The right choice depends on freshness requirements.

Layer 6 — Distributed cache (Redis, Memcached)

Now we get to the layer everyone calls "the cache." Distributed caches solve a different problem than in-process caches: shared state across many app instances. Session data, rate-limit counters, computed-result cache for queries that are expensive even after you've already paid for them once on another host.

The win: every replica sees the same cache. Set a key on host A, read it from host B. Cache hit ratios stay high because the cache is shared, not duplicated.

The cost: every lookup is a network call. ~0.5–1 ms in the same data center. That's 1000× slower than an in-process map. So the rule of thumb is: cache things in Redis when (a) the underlying compute is much more expensive than 1 ms, or (b) you need cross-host visibility. Don't cache a string concatenation in Redis. You'd be paying network latency to save nanoseconds of CPU.

The Redis-as-default trap. "Cache it in Redis" is a reflex, not a decision. If the underlying operation takes 200 µs and Redis adds 800 µs of round-trip, you've made things slower. Always benchmark before promoting. And remember: an in-process cache in front of Redis is often the right answer — local cache for the 99% case, Redis for the 1% miss.

Layer 7 — HTTP caching

This is the layer most backend engineers underuse. HTTP has had built-in caching semantics for thirty years. Every browser, every proxy, every CDN, every cache-aware library understands them. Cache-Control, ETag, If-None-Match, Last-Modified, If-Modified-Since — these are not legacy. They are why a properly-configured asset endpoint never hits your origin twice for the same client.

HTTP caching: standard headers, free coverage at every hop Response headers (the ones that matter): Cache-Control: public, max-age=3600 # cache for 1 h, anywhere Cache-Control: private, max-age=60 # only end-user cache, not shared proxies Cache-Control: no-cache # revalidate every time, but allow caching Cache-Control: no-store # don't cache anywhere, ever Cache-Control: stale-while-revalidate=30 # serve stale, refresh in background ETag: "v3-d41d8cd" # opaque content version Last-Modified: Sat, 01 May 2026 12:00:00 GMT # for fallback validators Conditional request: client sends If-None-Match → server returns 304 Not Modified (no body)

What you get for free when you set these headers correctly:

  • Browser cache for repeat visits (zero network).
  • Reverse proxy / CDN cache for everyone in the same region.
  • Conditional revalidation (304 Not Modified) — full RTT but tiny body.
  • Stale-while-revalidate — clients get instant responses while the cache refreshes in the background. This is criminally underused.
Senior-engineer move: before reaching for any application-level caching, ask "what would HTTP caching do here?" Static assets, immutable resources (versioned URLs), seldom-changing API responses — these can all be solved with headers, no Redis, no app cache, no code.

Layer 8 — CDN / edge cache

A CDN is an HTTP cache geographically distributed close to your users. CloudFront, Cloudflare, Fastly, Akamai. The math is simple: a user in Mumbai talking to your origin in Virginia is paying 200 ms of round-trip per request. Same user hitting a CDN POP in Mumbai pays 5 ms. The cache layer that physically moves the data closer to the user is doing more work than any application-level cache can.

What actually goes on a CDN:

  • Static assets — JS, CSS, images. Obvious.
  • API responses with cache-friendly headers — surprisingly often, GET endpoints can be safely edge-cached for seconds to minutes. Stale-while-revalidate makes this nearly invisible to users.
  • Whole pages — for content sites, edge-render or edge-cache the HTML. If-None-Match from the edge to the origin handles invalidation cheaply.
  • Programmable edge logic — Cloudflare Workers, Lambda@Edge — cache + transform at the edge. Powerful and easy to misuse.

Layer 9 — Browser cache

The cache closest to the user. Disk cache, memory cache, service-worker cache. Driven entirely by HTTP headers (and, for SPAs, Cache-Storage via service workers). The right Cache-Control on your bundle is the difference between "page loads instantly on second visit" and "user redownloads 2 MB every time."

Pattern that works: serve assets at versioned URLs (app.7f2c91.js) with Cache-Control: public, max-age=31536000, immutable. The URL changes when the asset changes; the browser caches the old one essentially forever. No invalidation needed — different content, different URL.

Layer 10 — DNS resolver caching

Don't laugh. DNS is a cache. Every getaddrinfo goes through layers of caching: local resolver, OS-level cache, ISP recursive resolver, authoritative server. TTLs control how long an answer is reusable. This is also where caching becomes a liability — long DNS TTLs mean failover after an outage takes minutes to propagate. Short TTLs mean more recursive lookups but faster recovery.

If your DR plan is "we'll change the DNS record," you'd better know what your TTL is, and what your downstream resolvers are doing with it.

Layer 11 — Reverse proxies (Varnish, nginx, Envoy)

Sitting between your CDN and your application is often a proxy that can also cache. nginx with proxy_cache, Varnish dedicated to it, Envoy with HTTP filters. This layer is useful for caching things that aren't quite static (so they don't go to the CDN) but are too expensive to recompute on every request — search-results pages, expensive aggregations, dashboard tiles.

The trade-off vs CDN: closer to your origin (lower TTFB to your origin still matters), under your operational control (you can purge, pre-warm, write VCL), but doesn't move data closer to users.

Layer 12 — Database query / plan cache

The database itself caches a few things you can't see from your app:

  • Query plan cache. Parsing and planning a SQL query is non-trivial. Prepared statements let the DB cache the plan and reuse it across executions.
  • Result cache. Some DBs (older MySQL, some commercial) cache query results. Mostly out of fashion — invalidation hell, didn't scale to write-heavy workloads. MySQL removed it in 8.0.
  • Connection-level state. Session caches, temporary tables, cached metadata. Reusing a connection (via a pool) keeps these warm.

Other places caches hide

  • ORM identity map / first-level cache. Hibernate, EF, ActiveRecord all cache loaded objects within a session/transaction. Same row twice = one DB round-trip.
  • ORM second-level cache. Cross-session cache, often backed by Caffeine or Redis. Powerful, dangerous if invalidation isn't tight.
  • Connection pools. Caching the expensive thing (TCP + auth + TLS handshake) so each request doesn't pay it.
  • Compiled regex cache. Most languages cache compiled regex. Java doesn't (per String.matches) — bug or feature, opinions vary.
  • Network stack caches. ARP cache, route cache, conntrack table — invisible until they fill up.
  • Build / compiler caches. ccache, sccache, Bazel remote cache, Turborepo. Not runtime, but the same ideas.

The hard part is invalidation

"There are only two hard things in computer science: cache invalidation and naming things." It's a joke and a warning. Invalidation is the part that makes caching dangerous. Every layer above has its own model:

LayerInvalidation modelWhat goes wrong
CPU cachesCache-coherence protocol (MESI/MOESI)False sharing, write-amp on shared lines
OS page cacheLRU + writes invalidate pagesRare; mostly invisible
DB buffer poolPage-level, integrated with the engineCold cache after restart, eviction storms
In-process cacheTTL, manual invalidate(), pub/subStale data per replica after writes
Distributed cacheTTL, write-through, manual delete on updateRace between cache delete and DB write
HTTP / CDNTTL via Cache-Control, ETag revalidation, purge APIStale content after deploy if purge missed
BrowserTTL + URL versioningUsers on old bundles after release; Ctrl+Shift+R ritual
DNSTTL onlySlow failover, "DNS is cached somewhere we don't control"

The pattern: every cache is a bet that data won't change before the cache expires. The more layers you stack, the more bets you're making in parallel. When the data changes, you have to invalidate every layer that has a copy — or accept stale reads at every one of them. This is why cache invalidation is hard: it's a distributed problem with no global clock.

The patterns you should know by name

Cache patterns: how reads and writes flow Cache-aside (lazy-loading) read: cache → on miss, DB → fill cache write: DB → invalidate cache most flexible, most common race window between write & invalidate Read-through read: cache layer fetches from DB app talks only to the cache cleaner abstraction, opaque cache cache becomes a critical path Write-through write: cache + DB synchronously cache always has fresh value simple correctness writes cost cache + DB latency Write-behind (write-back) write: cache only, DB async later batch + amortize DB writes huge write throughput wins data loss risk if cache dies Stale-while-revalidate return stale immediately refresh in background great UX for read-heavy brief staleness is acceptable? Request coalescing / single-flight N concurrent misses for same key → only 1 origin call, others wait prevents thundering herd on miss complexity at the cache client

Failure modes worth a name

  • Thundering herd / dogpile. Cached value expires, 1000 concurrent requests miss simultaneously, all 1000 hit the origin. Mitigation: single-flight, jittered TTLs, stale-while-revalidate.
  • Hot key. One key gets disproportionate traffic (a celebrity, a popular product). Distributed cache nodes that own that key get crushed. Mitigation: replicate hot keys across nodes, in-process front-cache, sharding by content rather than key.
  • Cache stampede on cold start. Deploy / restart, every cache empty, every request hits the DB. Mitigation: pre-warm caches, gradual deploy, request coalescing.
  • Negative-cache stampede. A query returns "not found." If you don't cache that, every miss re-queries forever. Cache the negative answer with a short TTL.
  • Stale fan-out on deploy. Deploy new code that writes a different shape; old replicas' caches still have old shape. Mitigation: version your cache keys, invalidate aggressively on deploy.
  • Write-skew between cache and DB. Update DB, then delete cache key — but the read got there first and re-populated cache with the old value. Mitigation: delete cache after DB write, with double-delete after a short delay; or use write-through.

A decision framework: which layer to cache at

If the bottleneck is…The right cache layer is…Why
Hot loop touching memoryCPU cache (data layout)Don't add a service. Make the data fit a cache line.
Reading the same file repeatedlyOS page cache (already happening)Just give the server enough RAM.
Same DB query, same row, repeatedlyDB buffer pool, then in-processBuffer pool is free. In-process beats Redis if the row is hot per replica.
Same query result across replicasDistributed cache (Redis / Memcached)Shared state is the whole point of going over the network.
Static asset to many usersHTTP cache + CDNMove the bytes geographically. Free, standard, scales infinitely.
Expensive computation per requestIn-process or distributed, depending on size + freshnessMemoize the expensive function. TTL it.
Repeat reads from the same clientBrowser cache + ETagThe client already has it. Just say "still fresh."
Slow DNS resolution / TLS handshakeConnection pool / DNS pinningCache the connection, not the answer.
The order to think in: can I avoid the work entirely (HTTP caching, ETag)? Can the OS or DB do this for me already (page cache, buffer pool)? Can I keep this in-process (Caffeine, sync.Map)? Only then: do I need a distributed cache (Redis)? Reaching for Redis first skips three layers that are free, faster, and don't add a network dependency.

What "good caching" looks like in practice

A typical well-cached web stack might look like this for a single user request to a product page:

  1. Browser cache hit on JS/CSS/images — zero network for assets. (HTTP cache.)
  2. HTML served from CDN edge — 5–20 ms TTFB from a nearby POP. (CDN cache.)
  3. API call from page → reverse proxy with short-lived cache for popular endpoints. (Proxy cache.)
  4. Application server: in-process Caffeine cache for feature flags, config, lookup tables. (In-process.)
  5. Application server: Redis for shared session, rate limits, computed query results. (Distributed.)
  6. If Redis miss: query Postgres, where the index pages are in shared_buffers and the file pages are in OS page cache. (DB + OS.)
  7. Postgres returns rows; CPU L1/L2 keep the hot tuples warm during processing. (CPU.)

The user sees a fast page. Behind it, a dozen caches did their job. None of them were "the cache." They were a cache system.

Closing take

The next time someone says "we need a cache," ask: at which layer. The answer "Redis" is sometimes right and often a habit. The cheaper answer is usually one or two layers up from where the discussion started.

Caching is plumbing. The real skill is knowing which pipe to add and which to leave alone, knowing how each one fails, and remembering that every cache is a small lie about freshness that you've decided to live with. The lies that work are the ones whose invalidation you've thought through. The ones that bite you are always the layer you forgot was even caching.

If you remember one thing: caching is layered, not singular. Each layer has different latency, capacity, and invalidation. The cheapest hit is the one that never leaves the CPU; the next is the one that never leaves the process; the next is the one that never leaves the host. Redis is somewhere on that list — not at the top of it.