Life Beyond Distributed Transactions — Why Hyperscale Apps Quietly Abandoned 2PC, and the Two Words That Replaced It

Tags: Distributed Systems · System Design · Sharding · Idempotency · Outbox · Sagas · 2PC · Reading time: ~22 min · Category: System Design → Architecture Decisions

In 2007 Pat Helland — a man who had spent a large chunk of his career building systems that promised global serializability — wrote a paper confessing that he had stopped believing in them. He called it “an Apostate’s Opinion.” Almost twenty years later it reads less like a confession and more like a blueprint for how every large system you use is actually built.

This post walks through that paper in plain language, connects each idea to the stack you ship today — sharded databases, Kafka, microservices, idempotency keys, sagas — and gives the vocabulary a name where the paper coined one. The two words worth remembering are entity and activity. Everything else falls out of them.

The Maginot Line

Helland opens with a metaphor. The Maginot Line was a vast, expensive fortification France built along its German border between the two World Wars. It worked exactly as designed: the German army never crossed it directly. In 1940 they simply drove around it, through Belgium.

His claim is that distributed-transaction protocols — two-phase commit (2PC), and the quorum/consensus machinery around it — are the Maginot Line of our industry. They are technically magnificent. They give the application programmer a beautiful façade: global serializability, the illusion that the entire system is one big database where any set of rows on any set of machines can be updated atomically.

And in practice, builders of very large systems route around them. Not out of ignorance — out of natural selection. When teams try to lean on distributed transactions at scale, the projects tend to founder on two rocks: performance (every cross-node commit pays a coordination tax) and fragility (2PC blocks when a participant or the coordinator is unavailable, and availability is the thing you were scaling for in the first place).

The paper's scope, stated honestly. Helland sets high availability aside and focuses purely on scalability — what falls out if you simply assume you cannot have large-scale distributed transactions. That single assumption drives the entire argument.

The thought experiment: almost-infinite scaling

The frame for the whole paper is what Helland calls almost-infinite scaling. Imagine the number of customers, orders, shipments, accounts, patients, devices — every business noun your app manages — growing without bound. Tens of thousands, hundreds of thousands of machines. Too many to pretend they are one big machine.

The key observation: the individual things usually don’t get bigger. One order is still one order. You just get more of them. So the system spreads what used to fit on a few machines across a great many machines, and we want it to scale roughly linearly with load — both data and computation.

This forces a precise question that small systems never have to ask:

When can I know that two pieces of data live on the same machine — and what do I do when I can’t guarantee they do?

That question is the hinge of everything that follows.

Two layers: scale-agnostic and scale-aware

Helland’s first assumption is that a scalable application has (at least) two layers that perceive scaling differently.

The lower layer is scale-aware. It knows machines get added, it owns the mapping from logical data to physical locations, and it does repartitioning as load grows.
The upper layer is scale-agnostic. It contains business logic and is written as if scaling weren’t happening at all. It never names a machine.

The lower layer provides a scale-agnostic programming abstraction to the upper layer. The paper’s 2007 example was Google’s MapReduce; today you’d point at a sharded datastore’s client, a service mesh, or a partitioned stream. The bet — borne out since — is that this lower layer eventually crystallizes into platforms and middleware, the way the messy terminal-multiplexing code of the 1970s eventually became TP-monitors and let programmers just write business logic.

Why this matters for you. The instant your business code hardcodes “this row is on the same box as that row,” it stops being scale-agnostic. The first deployment may prove you right. The repartition six months later will prove you wrong, in production, at 3 AM.

Disjoint scopes of serializability

The second assumption is the technical heart of the paper. Helland asks you to stop imagining one global transaction scope and instead assume many disjoint scopes of serializability. Picture each machine (or tight cluster) as a separate island where ACID transactions are real and local. Between islands, there is no atomic transaction. That’s what makes them disjoint.

Each piece of data lives in exactly one island. A transaction may touch any data within a single island. It may never atomically span two islands. Replication for high availability doesn’t change this — it just keeps copies; it doesn’t merge the islands.

If that sounds abstract, it’s exactly the world of a sharded database: a transaction inside one shard is ACID; a transaction across two shards is either impossible or expensive-and-fragile (2PC), so you avoid it.

At-least-once messaging is the water we swim in

The third assumption is about how islands talk. Consider the everyday job of consuming a message and durably writing some data:

You receive the message (not yet acknowledged).
You update the database.
You acknowledge the message.

The message broker and the database are two different durable systems, and the acknowledgement is not atomically coupled to the database write — only your application code stitches them together. So there is always a failure window. Crash between the write and the ack, restart, and the message gets processed again.

The broker behaves this way on purpose. Its only alternative is to occasionally lose messages (“at-most-once”), which is worse. So the industry standard is at-least-once delivery, and the bill lands on the application developer: you must tolerate message retries and out-of-order arrival.

This is the same lie as “exactly-once.” Exactly-once delivery survives only inside a single component. Cross a boundary — broker to DB, DB to another service — and you’re back to at-least-once. The patterns in this paper are precisely how you live there safely.

Entities: the unit of atomicity

Now the first big idea. If atomic transactions can’t span islands, the upper layer needs a unit it can reason about — a chunk of data it is allowed to update atomically. Helland names it the entity.

An entity is a collection of data with three properties:

It has a unique key (an entity-key) — a customer-id, an order-id, an account number, a device serial.
It is disjoint — its data never overlaps another entity’s data. Every datum lives in exactly one entity.
It lives within a single scope of serializability — so you can always do an atomic transaction inside one entity, and never across two.

There’s no constraint on how an entity is stored: SQL rows across several tables (whose primary key begins with the entity-key), a JSON document, a blob, files — whatever fits. An “order-processing” application is a sea of order entities. To be scalable, one order’s data must be disjoint from every other order’s.

Entity vs object. Why a new word? Because objects may or may not share a transaction scope, and many object frameworks happily let a transaction span objects. Entities never share a scope, because repartitioning can move them to different machines at any time. An object is only an entity if its data is strictly disjoint and is never atomically updated with any other data. If your “objects” lean on materialized views or alternate indices kept transactionally consistent, they will stop being entities the day you scale.

Repartitioning means you can never assume co-location

Because the scale-aware lower layer owns placement, the location of any entity can change as the deployment evolves. Hash partitioning: two different keys land together only by luck. Range partitioning: adjacent keys usually share a machine — until one day they don’t.

This produces the nastiest class of latent bug in the paper. A test that relies on two neighboring keys being atomically updatable will pass in the test deployment, because the neighbors happened to be co-located. The bug only surfaces months later when repartitioning splits them across scopes and the atomicity silently disappears.

You can never count on different entity-key values residing in the same place.

The only way to know two pieces of data can be updated atomically is if a single unique key unifies them — and at that point they’re just one entity. So a scale-agnostic abstraction must treat the entity as the hard boundary of atomicity.

The alternate-index tax

One consequence stings. We’re used to looking up a customer by id, by email, by credit-card number, by address — multiple indices, all kept perfectly consistent by the database. Under almost-infinite scaling, the entity sits in one scope of serializability, but the alternate index built over a different key must be assumed to live in a different scope.

You can’t atomically update an entity and its alternate index. The classic two-step lookup still works — resolve the alternate key to the entity-key, then fetch the entity by entity-key — but the two indices can’t be kept transactionally in sync. What the database used to do for free now becomes the application’s job: alternate indices are maintained asynchronously, via messaging, and readers must accept that an alternate index may be momentarily out of sync with the entity it points at.

Modern translation. This is exactly why your search index (Elasticsearch), your read-model (CQRS projection), and your denormalized lookup table lag the source of truth by a few hundred milliseconds. It isn’t sloppiness — it’s the alternate-index tax of disjoint scopes, and pretending otherwise is how you ship “user updated their email but login still fails” bugs.

Messaging across entities

If you can’t update two entities in one transaction, you update them in two transactions and connect them with a message. Three properties make this work.

Messages are asynchronous with respect to the sending transaction

The decision to send a message lives in the sending entity; the destination lives in another entity, which by definition can’t be updated atomically with the sender. So the message must not become visible at the destination until after the sender’s transaction commits.

Imagine the alternative: you send a message mid-transaction, the message escapes, and then your transaction aborts. Now the world has acted on something you have no memory of causing. To prevent this, the message must be enqueued transactionally — committed atomically with the state change that decided to send it.

You already build this. “Transactionally enqueue a message alongside the state change” is the transactional outbox pattern, word for word. Write the business row and an outbox row in the same local transaction; a separate poller relays outbox rows to the broker at-least-once. The paper described it in 2007; we gave it a catchy name later.

Messages are addressed to entity-keys, not machines

The scale-agnostic sender knows the destination’s entity-key, not its location. It is the scale-aware layer’s job to correlate the key to a physical location and deliver the message. As systems grow, the set of entities outgrows a single store; the lower layer routes by partitioning scheme, and the upper layer stays blissfully ignorant. The destination of a message is, effectively, an entity-key.

Repartitioning makes delivery messy

Entities move while messages are in flight. A message chases an entity to its old home, finds it gone, and has to follow. The clean FIFO illusion between a sender and a destination breaks: messages repeat, later messages arrive before earlier ones. This is why scale-agnostic applications converge on idempotent processing of every application-visible message, and tolerate reordering. There is no opting out.

Activities: what one entity remembers about a partner

Here is the paper’s second coinage, and the one most people miss. Since every message can arrive more than once and out of order, a recipient entity must keep state to defend itself — and that state is organized per partner. Helland calls this per-partner state an activity.

An entity that talks to many partners has many activities — one per partner entity. An activity is, plainly, “the stuff I remember about that partner.” It tracks which messages have been processed (for idempotence) and how far the relationship with that partner has advanced.

The paper’s running example: an order with many line items. There’s an entity for the order and a separate entity for each item in the warehouse — and transactions can’t be assumed across them. Inside the order entity, the per-item state (the reservation protocol, what’s been confirmed, what’s outstanding) is an activity. This pattern already exists in large apps; it just usually goes unnamed until a duplicate-message bug forces someone to build it deliberately.

Idempotence, defined carefully

Processing a message is idempotent if running it again causes no substantive change to the entity. “Substantive” is deliberately application-defined — writing a read-audit log record isn’t substantive; charging a card again is. Messages split into two buckets:

Naturally idempotent — reads, or changes that don’t alter substantive state. Safe to replay as-is.
Not naturally idempotent — substantive changes. Here the entity must remember it already processed this message so a retry makes no further change. And if a reply was expected, it must re-send the same reply, because it can’t know the original sender ever received it.

That memory is state, and it accumulates in the activity, per partner. This is the conceptual parent of every processed_events / idempotency-key table you’ve ever written. Helland notes the obvious: you could push duplicate elimination down into the plumbing, but the dedup state would have to migrate with the entity during repartitioning, and that machinery is rarely available — so the scale-agnostic application ends up owning idempotence itself.

Coping without atomicity: uncertainty and workflow

The final move is the deepest. Without distributed transactions, how do multiple entities agree on something — reserve inventory, confirm an order, close a deal?

With 2PC, the uncertainty of an in-flight decision is held in locks, managed by a transaction manager. Take that away and the uncertainty doesn’t vanish — it moves into your business logic. An entity reserves inventory “without yet knowing if it will be used.” That is accepting uncertainty, in the open, as application state. Helland is blunt about what this is:

This is simply workflow. Nothing magic — just that we can’t use distributed transactions, so we use workflow.

The mechanism is the tentative operation: one entity asks another to commit-with-a-right-to-cancel. Every tentative operation eventually resolves into either a confirming operation (the right to cancel is released) or a cancelling operation. Uncertainty rises as tentative operations are accepted and falls as confirmations and cancellations arrive — exactly how business actually works, with reservations, holds, and cancellation clauses.

The escrow example

Buying a house: the buyer, seller, mortgage company, and others each enter a two-party agreement of trust with one escrow company. Nobody knows the outcome until escrow closes; everyone accepts uncertainty meanwhile; only the escrow company decides. It’s a hub-and-spoke set of two-party relationships that gets many parties to agree without a distributed transaction across all of them.

That’s the scaling insight: large agreements are composed from fine-grained, two-party tentative/confirm/cancel relationships, knit together with entity-keys as the links and activities to track each partner’s known state. You don’t need a global lock; you need a web of small, durable, per-partner promises.

Modern translation. Tentative → confirm/cancel composed across entities is the saga pattern, and a tentative operation with a right-to-cancel is a compensating transaction. Reserve-then-confirm inventory, authorize-then-capture a payment — these are tentative operations whose uncertainty lives in business state, exactly as the paper describes.

Mapping the paper to your stack

The vocabulary is from 2007, but every term has a name you already use:

Helland's concept	What it means	What you call it today
Disjoint scopes of serializability	Atomicity is local to one island	Database shards / partitions
Entity	Keyed, disjoint, single-scope unit of atomicity	Aggregate (DDD) · shard key · partition key
Activity	Per-partner state an entity remembers	Idempotency table · saga state · per-conversation state
Transactional enqueue	Commit the message with the state change	Transactional outbox
Idempotent message processing	Replays cause no substantive change	Idempotency keys · upsert / ON CONFLICT
Tentative / confirm / cancel	Agreement without distributed locks	Saga · compensating transaction · reserve-then-confirm
Alternate index in another scope	Secondary lookups can't be atomic with the entity	CQRS read model · search index lag

The honest summary

If you remember nothing else, remember the chain of reasoning, because it’s airtight:

At real scale you cannot rely on distributed transactions — they’re too slow and too fragile.
So atomicity is confined to disjoint islands; each piece of data lives in exactly one.
The unit you can update atomically is the entity, identified by a key, living in one island, never atomically joined to another.
Entities talk only by messages, enqueued transactionally and delivered at-least-once, out of order.
So every recipient must be idempotent, remembering per-partner state in an activity.
And multi-entity agreement is reached by fine-grained workflow — tentative operations that confirm or cancel — not by a global lock.

What makes the paper enduring is its modesty. Helland isn’t inventing these patterns; he’s naming what good engineers were already doing implicitly — and arguing that naming them lets us apply them consistently, instead of rediscovering each one through a production incident. He even predicted the rest: just as the terminal-handling chaos of the 1970s hardened into TP-monitors, this hand-crafted scaling logic would harden into middleware. It did — event-streaming platforms, workflow engines, durable-execution frameworks, and CQRS toolkits are exactly that.

The goal was never exactly-once everywhere or one global transaction. It’s to know precisely what guarantee you have at every boundary, design each entity to be safe under it, and stop telling yourself the Maginot Line will hold.

Primary source. Pat Helland, “Life beyond Distributed Transactions: an Apostate’s Opinion,” CIDR 2007. Worth reading in full — it’s ten pages and remarkably free of jargon for how foundational it turned out to be.