The Scene: A Simple DELETE That Broke Everything

Friday evening. A backend engineer deploys a cleanup job to production:

-- "Just cleaning up cancelled orders, what could go wrong?"
DELETE FROM orders
WHERE status = 'cancelled'
  AND created_at < '2024-01-01';

Within minutes:

  • API response times jump from 45ms to 3,200ms
  • Connection pool saturates — new requests start queuing
  • Replica lag spikes to 38 seconds
  • Alerts fire across the dashboard

The query deleted 120,000 rows. On a small table, this would take milliseconds. But the orders table had 22M rows, 5 secondary indexes, and 400 concurrent transactions. That simple DELETE turned into a system-wide incident.

Let's understand exactly why.

What DELETE Actually Does Under the Hood

When you run DELETE FROM orders WHERE id = 42, here's what the database engine (InnoDB) actually does:

WHAT A SINGLE DELETE DOES INTERNALLY ───────────────────────────────────────────────────────────── Step 1: Find the row in the clustered index (PRIMARY KEY) → Traverse B-tree, locate page, find row Step 2: Acquire EXCLUSIVE row lock (X lock) → No other transaction can read or write this row Step 3: Write the old row to the UNDO LOG → For rollback and MVCC snapshots Step 4: Mark row as deleted in clustered index → Page may need reorganization if now underfilled Step 5: Remove entry from EVERY secondary index → idx_status: remove (status='cancelled', pk=42) → idx_created_at: remove (created_at='2023-06-15', pk=42) → idx_customer_id: remove (customer_id=1087, pk=42) → idx_total_amount:remove (total=299.99, pk=42) → idx_region: remove (region='us-east', pk=42) Each removal = B-tree traversal + potential page merge Step 6: Write to REDO LOG (WAL) → For crash recovery Step 7: Release locks → Only AFTER all index operations complete Total I/O operations for 1 row with 5 indexes: 1 clustered index delete + 5 secondary index deletes + 6 undo log writes + 6 redo log writes = ~18 I/O ops ─────────────────────────────────────────────────────────────

That's one row. Now multiply by 120,000 rows. And each secondary index removal involves a full B-tree traversal — seek to the leaf page, remove the entry, and potentially merge underfilled pages.

The Index Tax: Why DELETE Is So Expensive

This is the part most engineers miss. The row itself is cheap to remove. The indexes are where the real cost lives.

THE INDEX TAX OF DELETE ───────────────────────────────────────────────────────────── orders table with 5 secondary indexes: DELETE one row: ┌──────────────────────────┬───────────┬──────────────────┐ │ Operation │ I/O Ops │ Lock Scope │ ├──────────────────────────┼───────────┼──────────────────┤ │ Clustered index delete │ 2-3 │ Row X-lock │ │ idx_status removal │ 2-3 │ Index page latch │ │ idx_created_at removal │ 2-3 │ Index page latch │ │ idx_customer_id removal │ 2-3 │ Index page latch │ │ idx_total_amount removal │ 2-3 │ Index page latch │ │ idx_region removal │ 2-3 │ Index page latch │ │ Undo + Redo log writes │ 4-6 │ Log buffer │ ├──────────────────────────┼───────────┼──────────────────┤ │ TOTAL per row │ 16-21 │ 7 lock points │ └──────────────────────────┴───────────┴──────────────────┘ For 120,000 rows: → Up to 2.5M I/O operations → 840,000 index page visits → Potentially thousands of B-tree page merges ─────────────────────────────────────────────────────────────

B-tree Page Merges: The Hidden Killer

When you delete entries from a B-tree index, leaf pages become underfilled. When a page drops below ~50% capacity, InnoDB merges it with a neighbor. This merge operation:

  • Locks both pages
  • Copies entries from one page to another
  • Updates parent pointers
  • May cascade upward if the parent becomes underfilled
B-TREE PAGE MERGE AFTER DELETE ───────────────────────────────────────────────────────────── Before DELETE (healthy B-tree leaf level): ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Page A │ │ Page B │ │ Page C │ │ ████████░░ │ │ ██████████ │ │ ████████░░ │ │ 80% full │ │ 100% full │ │ 75% full │ └─────────────┘ └─────────────┘ └─────────────┘ After deleting entries from Page A: ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Page A │ │ Page B │ │ Page C │ │ ███░░░░░░░ │ │ ██████████ │ │ ████████░░ │ │ 30% full │ │ 100% full │ │ 75% full │ └─────────────┘ └─────────────┘ └─────────────┘ ↓ MERGE triggered (below 50% threshold) After merge: ┌──────────────────────────┐ ┌─────────────┐ │ Page A + C (merged) │ │ Page B │ │ ███████████████░░░░░░░░ │ │ ██████████ │ │ ~105% → may split again! │ │ 100% full │ └──────────────────────────┘ └─────────────┘ ↑ Expensive: copy + relink + parent update This happens on EACH of the 5 secondary indexes. ─────────────────────────────────────────────────────────────

With bulk deletes, these merges cascade across the entire index tree. This is why your I/O spikes during mass deletions.

Why findOne + UPDATE (Soft Delete) Is Dramatically Cheaper

Now let's look at what happens with a soft delete:

-- Instead of DELETE:
UPDATE orders
SET deleted_at = NOW()
WHERE id = 42;

Here's what the engine does:

WHAT A SOFT DELETE (UPDATE) DOES INTERNALLY ───────────────────────────────────────────────────────────── Step 1: Find the row in clustered index → Same as DELETE — traverse B-tree Step 2: Acquire EXCLUSIVE row lock (X lock) → Same as DELETE Step 3: Write old value to UNDO LOG → Only the deleted_at column's old value (NULL, 8 bytes) Step 4: Update deleted_at IN-PLACE on the same page → No page reorganization → No row movement → No page merge Step 5: Secondary indexes? → deleted_at is NOT in any secondary index → ZERO secondary index operations → ZERO page latches on index pages Step 6: Write to REDO LOG → Only the single column change Step 7: Release lock → Almost immediately Total I/O operations for 1 row with 5 indexes: 1 clustered index update + 0 secondary index changes + 1 undo log write + 1 redo log write = ~3 I/O ops Compare: DELETE = ~18 I/O ops vs UPDATE = ~3 I/O ops ─────────────────────────────────────────────────────────────

The key insight: UPDATE on a non-indexed column doesn't touch any secondary index. The deleted_at column exists only in the clustered index row. InnoDB modifies it in-place — same page, same slot. No B-tree traversals on secondary indexes. No page merges. No cascading rebalancing.

Side-by-Side: DELETE vs Soft Delete Cost

MetricHard DELETESoft DELETE (UPDATE)
Clustered index operationRemove row, reorganize pageModify column in-place
Secondary index operations1 removal per index (5 total)0 (column not indexed)
B-tree page mergesLikely on every indexNone
Undo log sizeFull row × all indexesSingle column old value
Lock hold timeLong (all index ops)Short (single page update)
I/O ops per row (5 indexes)~18~3
Replication bytesRow image + all index changesSingle column change
Foreign key cascade checkYes (for every FK)No

The Concurrency Impact: Locks Under Load

This is where the difference becomes critical in production.

LOCK CONTENTION: DELETE vs UPDATE ───────────────────────────────────────────────────────────── Timeline with 3 concurrent transactions: Hard DELETE (holds locks across all index operations): ───────────────────────────────────────────────────────── Tx1: [===LOCK=== clustered ──── idx1 ──── idx2 ──── idx3 ──── idx4 ──── idx5 ===UNLOCK===] Tx2: [WAITING............][===LOCK===] Tx3: [WAITING..............][===] ↑ Lock wait = 12ms Soft DELETE UPDATE (short lock, no index work): ───────────────────────────────────────────────────────── Tx1: [=LOCK= update =UNLOCK=] Tx2: [=LOCK= update =UNLOCK=] Tx3: [=LOCK= update =UNLOCK=] ↑ Lock wait = 0.2ms At 400 concurrent connections: DELETE: lock waits cascade → connection pool exhaustion UPDATE: transactions fly through → no visible contention ─────────────────────────────────────────────────────────────

DELETE holds the exclusive row lock for the entire duration of all index operations. With 5 secondary indexes, that lock is held 5-6x longer than needed for a simple UPDATE. Under high concurrency, this means:

  • Lock wait chains — transactions pile up waiting for the row lock
  • Connection pool saturation — waiting transactions hold connections idle
  • Replica lag — the replication thread must replay all index operations sequentially

Benchmark: Real Numbers

Test setup: MySQL 8.0, orders table with 10M rows, 5 secondary indexes, 100 concurrent connections, deleting/soft-deleting 50,000 rows.

MetricHard DELETESoft DELETE (UPDATE)Ratio
Total execution time14.2s2.1s6.8x faster
Avg lock wait per row8.4ms0.9ms9.3x less
Rows scanned in indexes300K0-
Redo log bytes written248MB38MB6.5x less
Replica lag (peak)22s3s7.3x less
InnoDB page merges4,2000-
Deadlocks during test120-

The numbers speak for themselves. DELETE is not just slower — it creates cascading effects (replica lag, deadlocks, page merges) that impact every other query running on the system.

The Implementation Pattern

Here's how to implement soft delete correctly:

-- Step 1: Add the soft delete column
ALTER TABLE orders ADD COLUMN deleted_at DATETIME NULL DEFAULT NULL;

-- Step 2: Add a partial index (PostgreSQL) or generated column index (MySQL)
-- PostgreSQL:
CREATE INDEX idx_orders_active ON orders (id) WHERE deleted_at IS NULL;

-- MySQL (no partial indexes — use a generated column):
ALTER TABLE orders
  ADD COLUMN is_active TINYINT GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) STORED;
CREATE INDEX idx_orders_active ON orders (is_active);
-- Step 3: Your queries now filter on deleted_at
-- Before (hard delete world — no filter needed):
SELECT * FROM orders WHERE customer_id = 1087;

-- After (soft delete — add the filter):
SELECT * FROM orders WHERE customer_id = 1087 AND deleted_at IS NULL;
-- Step 4: Application-level helper (Node.js / Prisma example)
-- In your ORM or query builder, add a default scope:

-- Prisma middleware example:
-- prisma.$use(async (params, next) => {
--   if (params.action === 'findMany' || params.action === 'findFirst') {
--     params.args.where = { ...params.args.where, deleted_at: null };
--   }
--   return next(params);
-- });
-- Step 5: Periodic hard-delete purge (off-peak hours)
-- Run during low-traffic windows (e.g., 3 AM Sunday)
DELETE FROM orders
WHERE deleted_at IS NOT NULL
  AND deleted_at < NOW() - INTERVAL 90 DAY
LIMIT 1000;
-- Use LIMIT to batch — never bulk-delete in one shot
-- Run in a loop with 500ms sleep between batches
SOFT DELETE LIFECYCLE ───────────────────────────────────────────────────────────── User clicks "Delete order": ┌─────────────────────────────────────────────────────────┐ │ UPDATE orders SET deleted_at = NOW() WHERE id = 42; │ │ → 1 column update, ~3 I/O ops, lock held < 1ms │ │ → Row still exists, still in all indexes │ │ → But filtered out by WHERE deleted_at IS NULL │ └─────────────────────────────────────────────────────────┘ ↓ Application sees it as "deleted" immediately ↓ 90 days later (Sunday 3 AM batch job): ┌─────────────────────────────────────────────────────────┐ │ DELETE FROM orders WHERE deleted_at < 90_days_ago │ │ LIMIT 1000; -- batched, with sleep between batches │ │ → Real DELETE happens during low traffic │ │ → No user-facing impact │ └─────────────────────────────────────────────────────────┘ Best of both worlds: ✅ Fast "delete" during peak hours (UPDATE) ✅ Actual data cleanup during off-peak (batched DELETE) ✅ Recovery window — undo accidental deletes within 90 days ─────────────────────────────────────────────────────────────

When Hard DELETE IS the Right Choice

Soft delete isn't always the answer. Use hard DELETE when:

ScenarioWhy Hard DELETE
GDPR / Right to ErasureLegal requirement — data must be truly gone, not just hidden
Storage-critical systemsSoft delete means rows never shrink. If disk is the bottleneck, you need real deletion
Tables with 0-1 indexesThe index tax is the main cost. No indexes = DELETE is cheap
Staging / dev environmentsNo concurrency pressure — just delete it
Log / event tables (append-only)Usually partitioned — drop the partition instead of row-by-row DELETE

For GDPR, the pattern is: soft delete immediately (fast), then run a compliance purge job that does the hard DELETE in batches during off-peak hours. You get fast user-facing response AND true erasure within your SLA.

Common Mistakes Engineers Make

-- MISTAKE 1: Indexing the deleted_at column unnecessarily
CREATE INDEX idx_deleted ON orders (deleted_at);
-- ❌ This defeats the purpose — now UPDATE touches this index too

-- CORRECT: Use partial index or filter in composite indexes
-- PostgreSQL:
CREATE INDEX idx_orders_customer ON orders (customer_id) WHERE deleted_at IS NULL;

-- MISTAKE 2: Bulk DELETE without LIMIT
DELETE FROM orders WHERE status = 'cancelled';
-- ❌ Locks millions of rows in one transaction

-- CORRECT: Batch with LIMIT
DELETE FROM orders WHERE status = 'cancelled' LIMIT 1000;
-- Run in a loop with delay between batches

-- MISTAKE 3: Forgetting to filter soft-deleted rows
SELECT COUNT(*) FROM orders WHERE customer_id = 1087;
-- ❌ Includes soft-deleted rows — wrong count!

-- CORRECT: Always include the filter
SELECT COUNT(*) FROM orders
WHERE customer_id = 1087 AND deleted_at IS NULL;

The Decision Framework

SHOULD YOU USE SOFT DELETE? ───────────────────────────────────────────────────────────── Is your table under high concurrent write load? ├── YES → Soft delete. Hard DELETE will cause lock contention. │ └── NO → Does the table have 3+ secondary indexes? ├── YES → Soft delete. Index tax makes DELETE expensive. │ └── NO → Is the table > 1M rows? ├── YES → Soft delete. Safer under growth. │ └── NO → Hard DELETE is fine. (But still batch with LIMIT!) ─────────────────────────────────────────────────────────────

The bottom line

DELETE is the most expensive DML operation in a relational database — not because removing one row is hard, but because the index maintenance cost scales with the number of secondary indexes. Every index doubles the work. In a high-concurrency system, that extra work translates directly into longer lock hold times, more contention, and cascading failures. Soft delete with UPDATE SET deleted_at = NOW() sidesteps the entire index tax. Use it as your default. Batch your real deletes for off-peak hours. Your production database will thank you.