Hard DELETE vs Soft DELETE Concurrency, Locking & Index Cost in Production Databases DELETE Hard delete — row removal ~18 I/O ops per row 5 index removals + page merges Long lock hold time B-tree rebalancing on every index UPDATE Soft delete — set deleted_at ~3 I/O ops per row 0 index operations Sub-millisecond lock hold In-place column update only 6.8x faster 0 index ops 0 deadlocks

The Scene: A Simple DELETE That Broke Everything

Friday evening. A backend engineer deploys a cleanup job to production:

-- "Just cleaning up cancelled orders, what could go wrong?"
DELETE FROM orders
WHERE status = 'cancelled'
  AND created_at < '2024-01-01';

Within minutes:

  • API response times jump from 45ms to 3,200ms
  • Connection pool saturates — new requests start queuing
  • Replica lag spikes to 38 seconds
  • Alerts fire across the dashboard

The query deleted 120,000 rows. On a small table, this would take milliseconds. But the orders table had 22M rows, 5 secondary indexes, and 400 concurrent transactions. That simple DELETE turned into a system-wide incident.

Let's understand exactly why.

What DELETE Actually Does Under the Hood

When you run DELETE FROM orders WHERE id = 42, here's what the database engine (InnoDB) actually does:

What a Single DELETE Does Internally 1 Find row in clustered index (PK) Traverse B-tree, locate page, find row 2 Acquire EXCLUSIVE row lock (X lock) No other transaction can read or write this row 3 Write old row to UNDO LOG For rollback and MVCC snapshots 4 Mark row deleted in clustered index Page may need reorganization if underfilled 5 Remove entry from EVERY secondary index idx_status: remove (status='cancelled', pk=42) idx_created_at: remove (created_at='2023-06-15', pk=42) idx_customer_id: remove (customer_id=1087, pk=42) idx_total_amount: remove (total=299.99, pk=42) idx_region: remove (region='us-east', pk=42) Each = B-tree traversal + potential page merge 6 Write to REDO LOG (WAL) for crash recovery 7 Release locks (AFTER all index ops) Total: 1 clustered + 5 secondary deletes + 6 undo + 6 redo = ~18 I/O ops per row

That's one row. Now multiply by 120,000 rows. And each secondary index removal involves a full B-tree traversal — seek to the leaf page, remove the entry, and potentially merge underfilled pages.

The Index Tax: Why DELETE Is So Expensive

This is the part most engineers miss. The row itself is cheap to remove. The indexes are where the real cost lives.

The Index Tax of DELETE orders table with 5 secondary indexes Operation I/O Ops Lock Scope Clustered index delete 2-3 Row X-lock idx_status removal 2-3 Index page latch idx_created_at removal 2-3 Index page latch idx_customer_id removal 2-3 Index page latch idx_total_amount removal 2-3 Index page latch idx_region removal 2-3 Index page latch Undo + Redo log writes 4-6 Log buffer TOTAL per row 16-21 7 lock points For 120,000 rows: Up to 2.5M I/O ops 840,000 index page visits Thousands of page merges

B-tree Page Merges: The Hidden Killer

When you delete entries from a B-tree index, leaf pages become underfilled. When a page drops below ~50% capacity, InnoDB merges it with a neighbor. This merge operation:

  • Locks both pages
  • Copies entries from one page to another
  • Updates parent pointers
  • May cascade upward if the parent becomes underfilled
B-tree Page Merge After DELETE Before DELETE (healthy B-tree leaf level): Page A 80% full Page B 100% full Page C 75% full After deleting entries from Page A: Page A 30% full Page B 100% full Page C 75% full MERGE triggered (below 50% threshold) After merge: Page A + C (merged) ~105% -- may split again! Page B 100% full Expensive: copy + relink + parent update. Happens on EACH of the 5 secondary indexes.

With bulk deletes, these merges cascade across the entire index tree. This is why your I/O spikes during mass deletions.

Why findOne + UPDATE (Soft Delete) Is Dramatically Cheaper

Now let's look at what happens with a soft delete:

-- Instead of DELETE:
UPDATE orders
SET deleted_at = NOW()
WHERE id = 42;

Here's what the engine does:

What a Soft DELETE (UPDATE) Does Internally 1 Find row in clustered index Same as DELETE -- traverse B-tree 2 Acquire EXCLUSIVE row lock (X lock) Same as DELETE 3 Write old value to UNDO LOG Only deleted_at old value (NULL, 8 bytes) 4 Update deleted_at IN-PLACE No page reorg, no row movement, no merge Same page, same slot 5 Secondary indexes? ZERO operations -- deleted_at is NOT in any secondary index No page latches on index pages 6 Write to REDO LOG (single column change) 7 Release lock -- almost immediately UPDATE = ~3 I/O ops DELETE = ~18 I/O ops vs

The key insight: UPDATE on a non-indexed column doesn't touch any secondary index. The deleted_at column exists only in the clustered index row. InnoDB modifies it in-place — same page, same slot. No B-tree traversals on secondary indexes. No page merges. No cascading rebalancing.

Side-by-Side: DELETE vs Soft Delete Cost

MetricHard DELETESoft DELETE (UPDATE)
Clustered index operationRemove row, reorganize pageModify column in-place
Secondary index operations1 removal per index (5 total)0 (column not indexed)
B-tree page mergesLikely on every indexNone
Undo log sizeFull row × all indexesSingle column old value
Lock hold timeLong (all index ops)Short (single page update)
I/O ops per row (5 indexes)~18~3
Replication bytesRow image + all index changesSingle column change
Foreign key cascade checkYes (for every FK)No

The Concurrency Impact: Locks Under Load

This is where the difference becomes critical in production.

Lock Contention: DELETE vs UPDATE Timeline with 3 concurrent transactions Hard DELETE (holds locks across all index operations) Tx1 LOCK clustered idx1 idx2 idx3 idx4 idx5 UNLOCK Tx2 WAITING... LOCK...UNLOCK Tx3 WAITING............ Lock wait = 12ms Soft DELETE (UPDATE) (short lock, no index work) Tx1 LOCK UPD UNLOCK Tx2 LOCK UPD UNLOCK Tx3 LOCK UPD UNLOCK Lock wait = 0.2ms DELETE at 400 conn: pool exhaustion UPDATE at 400 conn: no contention

DELETE holds the exclusive row lock for the entire duration of all index operations. With 5 secondary indexes, that lock is held 5-6x longer than needed for a simple UPDATE. Under high concurrency, this means:

  • Lock wait chains — transactions pile up waiting for the row lock
  • Connection pool saturation — waiting transactions hold connections idle
  • Replica lag — the replication thread must replay all index operations sequentially
Benchmark Comparison — 50K Rows, 5 Indexes, 100 Connections Execution Time DELETE 14.2s UPDATE 2.1s Avg Lock Wait / Row DELETE 8.4ms UPDATE 0.9ms Peak Replica Lag DELETE 22s UPDATE 3s

Benchmark: Real Numbers

Test setup: MySQL 8.0, orders table with 10M rows, 5 secondary indexes, 100 concurrent connections, deleting/soft-deleting 50,000 rows.

MetricHard DELETESoft DELETE (UPDATE)Ratio
Total execution time14.2s2.1s6.8x faster
Avg lock wait per row8.4ms0.9ms9.3x less
Rows scanned in indexes300K0-
Redo log bytes written248MB38MB6.5x less
Replica lag (peak)22s3s7.3x less
InnoDB page merges4,2000-
Deadlocks during test120-

The numbers speak for themselves. DELETE is not just slower — it creates cascading effects (replica lag, deadlocks, page merges) that impact every other query running on the system.

The Implementation Pattern

Here's how to implement soft delete correctly:

-- Step 1: Add the soft delete column
ALTER TABLE orders ADD COLUMN deleted_at DATETIME NULL DEFAULT NULL;

-- Step 2: Add a partial index (PostgreSQL) or generated column index (MySQL)
-- PostgreSQL:
CREATE INDEX idx_orders_active ON orders (id) WHERE deleted_at IS NULL;

-- MySQL (no partial indexes — use a generated column):
ALTER TABLE orders
  ADD COLUMN is_active TINYINT GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) STORED;
CREATE INDEX idx_orders_active ON orders (is_active);
-- Step 3: Your queries now filter on deleted_at
-- Before (hard delete world — no filter needed):
SELECT * FROM orders WHERE customer_id = 1087;

-- After (soft delete — add the filter):
SELECT * FROM orders WHERE customer_id = 1087 AND deleted_at IS NULL;
-- Step 4: Application-level helper (Node.js / Prisma example)
-- In your ORM or query builder, add a default scope:

-- Prisma middleware example:
-- prisma.$use(async (params, next) => {
--   if (params.action === 'findMany' || params.action === 'findFirst') {
--     params.args.where = { ...params.args.where, deleted_at: null };
--   }
--   return next(params);
-- });
-- Step 5: Periodic hard-delete purge (off-peak hours)
-- Run during low-traffic windows (e.g., 3 AM Sunday)
DELETE FROM orders
WHERE deleted_at IS NOT NULL
  AND deleted_at < NOW() - INTERVAL 90 DAY
LIMIT 1000;
-- Use LIMIT to batch — never bulk-delete in one shot
-- Run in a loop with 500ms sleep between batches
Soft Delete Lifecycle User clicks "Delete order" UPDATE orders SET deleted_at = NOW() WHERE id = 42; 1 column update, ~3 I/O ops, lock held < 1ms Row still exists in all indexes, filtered by WHERE deleted_at IS NULL Application sees it as "deleted" immediately 90 days later (Sunday 3 AM batch job) DELETE FROM orders WHERE deleted_at < 90_days_ago LIMIT 1000; Batched with sleep between batches Real DELETE during low traffic -- no user-facing impact Fast "delete" at peak (UPDATE) Cleanup at off-peak (batched DELETE) 90-day recovery window (undo accidental deletes) Decision Framework: Soft Delete or Hard Delete? High concurrent write load? (400+ active connections) YES Soft Delete NO Table has 3+ secondary indexes? (index tax multiplies DELETE cost) YES Soft Delete NO Table has 1M+ rows? (safer to plan for growth) YES Soft Delete NO Hard DELETE is fine (but still batch with LIMIT!)

When Hard DELETE IS the Right Choice

Soft delete isn't always the answer. Use hard DELETE when:

ScenarioWhy Hard DELETE
GDPR / Right to ErasureLegal requirement — data must be truly gone, not just hidden
Storage-critical systemsSoft delete means rows never shrink. If disk is the bottleneck, you need real deletion
Tables with 0-1 indexesThe index tax is the main cost. No indexes = DELETE is cheap
Staging / dev environmentsNo concurrency pressure — just delete it
Log / event tables (append-only)Usually partitioned — drop the partition instead of row-by-row DELETE

For GDPR, the pattern is: soft delete immediately (fast), then run a compliance purge job that does the hard DELETE in batches during off-peak hours. You get fast user-facing response AND true erasure within your SLA.

Common Mistakes Engineers Make

-- MISTAKE 1: Indexing the deleted_at column unnecessarily
CREATE INDEX idx_deleted ON orders (deleted_at);
-- ❌ This defeats the purpose — now UPDATE touches this index too

-- CORRECT: Use partial index or filter in composite indexes
-- PostgreSQL:
CREATE INDEX idx_orders_customer ON orders (customer_id) WHERE deleted_at IS NULL;

-- MISTAKE 2: Bulk DELETE without LIMIT
DELETE FROM orders WHERE status = 'cancelled';
-- ❌ Locks millions of rows in one transaction

-- CORRECT: Batch with LIMIT
DELETE FROM orders WHERE status = 'cancelled' LIMIT 1000;
-- Run in a loop with delay between batches

-- MISTAKE 3: Forgetting to filter soft-deleted rows
SELECT COUNT(*) FROM orders WHERE customer_id = 1087;
-- ❌ Includes soft-deleted rows — wrong count!

-- CORRECT: Always include the filter
SELECT COUNT(*) FROM orders
WHERE customer_id = 1087 AND deleted_at IS NULL;

The Decision Framework

Should You Use Soft Delete? High concurrent write load? (400+ active connections) YES Soft Delete NO Table has 3+ secondary indexes? (index tax multiplies DELETE cost) YES Soft Delete NO Table has 1M+ rows? (safer to plan for growth) YES Soft Delete NO Hard DELETE is fine (but still batch with LIMIT!)

The bottom line

DELETE is the most expensive DML operation in a relational database — not because removing one row is hard, but because the index maintenance cost scales with the number of secondary indexes. Every index doubles the work. In a high-concurrency system, that extra work translates directly into longer lock hold times, more contention, and cascading failures. Soft delete with UPDATE SET deleted_at = NOW() sidesteps the entire index tax. Use it as your default. Batch your real deletes for off-peak hours. Your production database will thank you.