The lie: push it all on the database
Every system starts with the same lie. And it's a good lie — it's correct, it's clean, it's honest.
Version one ships with the database as the single source of truth. Every constraint lives there. Foreign keys to prevent orphans. UNIQUE to prevent duplicates. NOT NULL to prevent bad data. CHECK constraints for rules. Cascading deletes to clean up children when a parent dies. The application is a thin client — it just reads and writes. No business logic in the app, all in the database. It's the right design.
And it works for months. Staging runs at 50 writes/sec. Everything is fast. The team ships. Customers start using it.
Then production traffic climbs. 50 becomes 500. 500 becomes 5,000. Same schema. Same queries. Same code. Nothing changed. But p99 latency goes from 5 ms to 50 ms to 500 ms. The team is paging oncall at 2 AM. The database is getting slower. Except it's not.
The lie breaks the moment load reveals it: a constraint that's free at 50 RPS is a bottleneck at 5,000 RPS.
Your database is doing exactly what you asked. For every FK, it walks the parent table and takes a shared lock — held until your transaction ends. For every UNIQUE, it scans the index. For every CASCADE, it deletes thousands of children synchronously, holding locks for every delete. For every write, it updates every index. For every commit, it fsyncs to disk. Correct operations. Necessary for consistency. Invisible at low load. The entire system at high load.
What actually happens on one write
At 50 writes/sec, steps 1–6 all happen in 5 ms and nobody cares. At 5,000 writes/sec on a hot parent row, steps 2 and 3 alone turn into a queue.
The ten places where the lie breaks
1. Too many indexes — Every index is a separate B-tree walk. 5 indexes = roughly 6× the write work. Unused indexes are pure tax on every INSERT, UPDATE, DELETE.
2. Per-row writes — One row at a time: 1,000 INSERTs = 1,000 round trips + 1,000 fsyncs. Batched: 1 INSERT with 1,000 rows = 1 round trip + 1 fsync. The difference is 50–200×. Most teams don't see this until they look at the app logs.
3. Foreign keys on hot parents — A FK to a cold table costs nothing. A FK to a row referenced by 5,000 children/sec becomes the bottleneck. When you INSERT an order with a customer_id, the database takes a shared lock on the customer row — held until your transaction commits. The moment any transaction tries to UPDATE the customer, it queues behind every in-flight order. That's where p99 goes to 500 ms.
4. Cascading deletes on hot rows — DELETE one parent with CASCADE = delete thousands of children synchronously inside the same transaction. That transaction holds locks for milliseconds of work. Every other writer touching those children queues behind you.
5. fsync latency — A modern NVMe can read/write gigabytes/sec. But fsync — the operation that forces buffered data to durable storage — is orders of magnitude slower. If fsync takes 10 ms, your single-threaded commit rate is 100/sec, period. This is usually 90% of commit latency. Most teams don't realize until they benchmark the disk with pg_test_fsync.
6. Lock contention — If you hold a row lock during an HTTP call, every other transaction touching that row waits for the network roundtrip. One slow transaction holds up hundreds. Never do this.
7. Buffer pool too small — When your working set exceeds the buffer pool, every write includes a synchronous read from disk (16 KB). That's 10–30 ms of latency you didn't expect. Size the pool to ~70% of system RAM.
8. WAL flush dominates latency — The Write-Ahead Log is sequential so throughput is fine. But every commit waits for one fsync of the WAL. If that's 8 ms, your p99 is at least 8 ms. This usually dominates the other costs combined.
9. Synchronous replication — Waiting for a replica to acknowledge adds one RTT plus one remote fsync to every commit. That's 2–3× commit latency for "no data loss on failover." Only worth it if your business genuinely requires it. Cross-region sync replication is usually a mistake.
10. ORM N+1 writes — The ORM makes per-row writes feel natural. A developer loops over a list and the framework sends N separate queries — sometimes N separate connections. The database is fine. The app is the bottleneck.
Why staging never catches this
Your staging environment runs at 50 writes/sec. Production runs at 5,000. That's not a performance problem, it's a *load problem*. The same SQL is correct at both loads. The same code is correct at both loads. But the constraints you threw on the database are paid by every write — flat cost at low load, contention curve at high load.
You need load testing. Not the kind where you throw a load generator at the system for 10 minutes. Real load testing: sustained concurrency on the hot rows. What happens when 1,000 threads all try to insert orders for the same merchant, simultaneously? That's when the lie breaks.
Which layer should actually pay?
Not every constraint belongs in the database. Some belong there forever: primary-key uniqueness, ACID transactions, durability, transactional isolation. These are hard to move and the database is optimized for them.
Others only belong in the database if the parent isn't hot. A FK to a cold table (low write volume) costs nothing. A FK to a hot row becomes the system.
Senior teams know: which layer should enforce this invariant? is the real question, not where does the contract live?
High-RPS systems move hot FKs to the application: validate the parent exists in-memory or via a cache hit, skip the database lock entirely, and run a reconciliation job every few minutes to find any references that became invalid. They trade a database guarantee for operational discipline elsewhere.
The pattern is simple: enforcer + checker.
- The enforcer is the application validator running on the hot path. It checks the FK before insert.
- The checker is a background job that periodically finds rows that violate the invariants the enforcer should have caught.
Without the checker, you have a guarantee on paper and corruption in production.
How to find which one is biting you
Nine times out of ten, if write p99 is dominated by wait time (not CPU), the answer is fsync, locks, or buffer-pool misses. Each has a simple diagnostic:
-- PostgreSQL: What are commits waiting on?
SELECT pid, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE wait_event IS NOT NULL;
-- MySQL: Same question
SELECT * FROM performance_schema.data_locks;
-- PostgreSQL: fsync latency
pg_test_fsync
-- PostgreSQL: FK delays
SELECT pid, now() - xact_start AS age, state, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY xact_start LIMIT 10;
Ten minutes of SQL tells you which of the ten is yours.
Key takeaways
- The lie works until it doesn't. At 50 RPS, throw everything on the database. At 5,000 RPS, you're paying for every constraint on every write.
- Staging and production are different systems. The same SQL is correct at 50 writes/sec and impossible at 5,000. You need real load testing.
- A constraint that costs 0.3 ms in isolation costs 65 ms under contention. Measure under your actual concurrency, not staging.
- Batch everything you can. 1,000 rows in one statement beats 1,000 statements by 50–200×. This usually fixes more outages than database tuning.
- fsync is the biggest single cost. If storage can't fsync in 1–2 ms, no query tuning saves you. Move WAL to faster disk. Enable group commit.
- For hot parents, ask: who pays? Database pays with queued commits. Application pays with validation logic and reconciliation jobs. Pick where it's cheaper.
- Cascading deletes on hot rows are expensive. DELETE one parent with CASCADE to 10,000 children = 50+ ms transaction holding locks. Consider soft deletes instead.
- Measure before you fix. Most teams "fix" the wrong bottleneck because they skipped the diagnostic step.
Slow writes aren't a mystery. They're the cost of decisions you made when load was low. The trick is measuring early enough to change your mind before production pays the price.