You shipped a hash ring. The deck looked perfect. PagerDuty disagreed. The ring only stopped you from re-mapping every key when N twitches; it does not hand you even load, agreement on who is alive, or range scans for free. Below: failure modes with diagrams, three case studies with before/after, a real compare on range-based partitioning, and a table for choosing your pain.
Why your consistent hashing is failing anyway
The algorithm is a placement policy, not a luck charm. These three are what show up in real incidents when the whiteboard was “correct.”
Hotspots and uneven load
Uniform hash spread is a statistical story. In production, a viral ID, a default tenant, or a shared prefix maps so much traffic to one physical host that your dashboard looks like a binary star. What breaks: p99 latency, CPU pegged on one node, “fair” autoscaling that adds replicas where load is not the problem. What actually helps: more virtual nodes, per-tenant rings, admission limits, and admitting that a ring is not a load generator.
Successor overload when a node dies
Failed node’s keys walk clockwise to the next host. If that host was already full, the outage is not “one box” — the successor takes a double serving and may fall over, domino-style.
Stale ring views and “split” routing
Process A still routes with ring generation 41. Process B uses 42 (new node added). For minutes, the same key can land in two different places. That is a hash ring failure in the “people disagree on truth” sense — duplicate work, 409s, or silent divergence.
Hash ring 101 — enough to compare with ranges
Map hash output to a circle, put nodes on it (classically many virtual nodes per box), hash each key, walk clockwise to the first node — that is the owner. hash(key) % N re-homes most keys when N changes; a ring re-homes keys only near the add/remove — the waiter-quit analogy: you re-seat one section, not the whole restaurant.
# Modulo: resize N → almost every key moves
node = abs(hash(key)) % N
# Consistent: sorted positions; find successor on ring; virtual nodes = many pos per host
# Range: which interval [lo, hi) contains key? split/move intervals to scale
What the ring is not: a linearizability layer, a fix for SELECT … BETWEEN without a query plan, or a substitute for agreed membership. For ordered primary keys and big scans, read on.
Case studies — when the diagram matched reality (and when it did not)
Composite stories from production-style incidents — numbers are illustrative, not a single named company. They are useful because they show the metric shape of each failure class.
Case 1: Viral product key on a large session cache (hash ring, Redis-like)
Shape: 18-node cache, consistent hashing in front, JSON blobs keyed by session:{id} and shared read-through to product:{productId} for a flash sale. Trigger: one product:4421 went viral. Observed: one primary shard CPU 91%, others 18–30%; p99 get from 4 ms → 180 ms for unrelated keys co-located on that node’s responsibility arc. Root cause: not N or the hash — application key skew. The ring was fair; the business was not.
Case 2: Rolling deploy with two ring generations (7 minutes of “ghost” node)
Shape: 40 edge nodes each embedding a 2 MB cluster map. Trigger: canary on 5 nodes got map v412; rest still v411 with one host removed. Observed: 0.08% of writes duplicated or retried to wrong target; reconciliation job depth +3×. Root cause: clients not atomically switching maps at the same generation.
Case 3: Time-series + monthly ranges — backfill made January “the fat shard”
Shape: metrics DB sharded by month on (tenant_id, t) PK. Trigger: 6-day historical backfill for one tenant. Observed: January partition 4× the read QPS of February; p99 on that range 2.1s vs 120 ms elsewhere. Root cause: hot range — a range partition problem, not a ring problem.
Range-based partitioning — deeper (why teams still love it)
You carve the sortable keyspace into [start, end) intervals. A shard (tablet, region, “partition”) answers every key that sorts in that half-open range. Range-based partitioning is how Bigtable, HBase, Cockroach, Spanner-style systems, and many SQL shard routers (Vitess, Citus, etc.) think — because the storage engine already orders keys.
Lookup: binary search a sorted list of ranges (or a tree), often cached. Rebalance: split a range, move a subrange, or merge. Failure modes mirror the ring’s but on a line: hot range (one interval gets all the traffic), bad cut (split that does not split load), stale range map (two routers disagree on boundaries).
What you gain vs a pure hash ring
- Range and prefix scans that can stay on one (or a few) shards — time windows,
WHERE pk BETWEEN, “all of tenant 7’s rows” with a well-chosen key. - Operational handles: “this interval is the fire” beats “this arc after 0x9f2…” for humans on call.
- Alignment with time-ordered PKs so you can plan where new writes land.
What you pay
- Hot ranges and bad boundaries — same class as hot spots, different axis (sort order).
- Rebalancing is explicit work — copy, verify, cut over; automation helps but is not free.
- Every client needs the range map — same discipline as a ring: versions, health, tests.
When to use a hash ring vs range-based partitions
Use the table when you are picking a default for a new system — not when you are cargo-culting the last project.
| Factor | Prefer consistent hashing | Prefer range partitions |
|---|---|---|
| Key shape | Opaque IDs, session keys — no business sort in the key. | Sortable PK, time, tenant prefix you rely on in queries. |
| Queries | Point gets/sets, cache semantics. | Scans, feeds, BETWEEN, time-series rollups. |
| Elasticity | Nodes in/out often; want small remaps. | More static cluster, or ops-driven splits. |
| Control | Virtual nodes, many rings, app-level sharding of hot keys. | Named intervals, move a tenant or month by moving a range. |
| Stack | Caches, Dynamo-style K/V, many pub/sub consumer maps. | Wide-column, distributed SQL, tablet stores. |
Combos that are not hypocrisy: ring at the cache (elastic, opaque) + range in the database (ordered, queryable). Hash to bucket then range inside is also common — two layers, two invariants, document both.
Closing
Consistent hashing is still the right default for a lot of elastic, point-read-heavy, opaque-key infrastructure. Range-based partitioning is the right default when the product is about order and ranges. The case studies above share one theme: the incident report starts with metrics and ownership, not with “we need more vnodes” as the only knob. Map your failure modes, pick the pain you can run in operations, then draw the pretty picture on top — not the other way around.