Stop Using DELETE in Production: Hard Delete vs Soft Delete in High-Concurrency Systems

The Scene: A Simple DELETE That Broke Everything

Friday evening. A backend engineer deploys a cleanup job to production:

-- "Just cleaning up cancelled orders, what could go wrong?"
DELETE FROM orders
WHERE status = 'cancelled'
  AND created_at < '2024-01-01';

Within minutes:

API response times jump from 45ms to 3,200ms
Connection pool saturates — new requests start queuing
Replica lag spikes to 38 seconds
Alerts fire across the dashboard

The query deleted 120,000 rows. On a small table, this would take milliseconds. But the orders table had 22M rows, 5 secondary indexes, and 400 concurrent transactions. That simple DELETE turned into a system-wide incident.

Let's understand exactly why.

What DELETE Actually Does Under the Hood

When you run DELETE FROM orders WHERE id = 42, here's what the database engine (InnoDB) actually does:

That's one row. Now multiply by 120,000 rows. And each secondary index removal involves a full B-tree traversal — seek to the leaf page, remove the entry, and potentially merge underfilled pages.

The Index Tax: Why DELETE Is So Expensive

This is the part most engineers miss. The row itself is cheap to remove. The indexes are where the real cost lives.

B-tree Page Merges: The Hidden Killer

When you delete entries from a B-tree index, leaf pages become underfilled. When a page drops below ~50% capacity, InnoDB merges it with a neighbor. This merge operation:

Locks both pages
Copies entries from one page to another
Updates parent pointers
May cascade upward if the parent becomes underfilled

With bulk deletes, these merges cascade across the entire index tree. This is why your I/O spikes during mass deletions.

Why findOne + UPDATE (Soft Delete) Is Dramatically Cheaper

Now let's look at what happens with a soft delete:

-- Instead of DELETE:
UPDATE orders
SET deleted_at = NOW()
WHERE id = 42;

Here's what the engine does:

The key insight: UPDATE on a non-indexed column doesn't touch any secondary index. The deleted_at column exists only in the clustered index row. InnoDB modifies it in-place — same page, same slot. No B-tree traversals on secondary indexes. No page merges. No cascading rebalancing.

Side-by-Side: DELETE vs Soft Delete Cost

Metric	Hard DELETE	Soft DELETE (UPDATE)
Clustered index operation	Remove row, reorganize page	Modify column in-place
Secondary index operations	1 removal per index (5 total)	0 (column not indexed)
B-tree page merges	Likely on every index	None
Undo log size	Full row × all indexes	Single column old value
Lock hold time	Long (all index ops)	Short (single page update)
I/O ops per row (5 indexes)	~18	~3
Replication bytes	Row image + all index changes	Single column change
Foreign key cascade check	Yes (for every FK)	No

The Concurrency Impact: Locks Under Load

This is where the difference becomes critical in production.

DELETE holds the exclusive row lock for the entire duration of all index operations. With 5 secondary indexes, that lock is held 5-6x longer than needed for a simple UPDATE. Under high concurrency, this means:

Lock wait chains — transactions pile up waiting for the row lock
Connection pool saturation — waiting transactions hold connections idle
Replica lag — the replication thread must replay all index operations sequentially

Benchmark: Real Numbers

Test setup: MySQL 8.0, orders table with 10M rows, 5 secondary indexes, 100 concurrent connections, deleting/soft-deleting 50,000 rows.

Metric	Hard DELETE	Soft DELETE (UPDATE)	Ratio
Total execution time	14.2s	2.1s	6.8x faster
Avg lock wait per row	8.4ms	0.9ms	9.3x less
Rows scanned in indexes	300K	0	-
Redo log bytes written	248MB	38MB	6.5x less
Replica lag (peak)	22s	3s	7.3x less
InnoDB page merges	4,200	0	-
Deadlocks during test	12	0	-

The numbers speak for themselves. DELETE is not just slower — it creates cascading effects (replica lag, deadlocks, page merges) that impact every other query running on the system.

The Implementation Pattern

Here's how to implement soft delete correctly:

-- Step 1: Add the soft delete column
ALTER TABLE orders ADD COLUMN deleted_at DATETIME NULL DEFAULT NULL;

-- Step 2: Add a partial index (PostgreSQL) or generated column index (MySQL)
-- PostgreSQL:
CREATE INDEX idx_orders_active ON orders (id) WHERE deleted_at IS NULL;

-- MySQL (no partial indexes — use a generated column):
ALTER TABLE orders
  ADD COLUMN is_active TINYINT GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) STORED;
CREATE INDEX idx_orders_active ON orders (is_active);

-- Step 3: Your queries now filter on deleted_at
-- Before (hard delete world — no filter needed):
SELECT * FROM orders WHERE customer_id = 1087;

-- After (soft delete — add the filter):
SELECT * FROM orders WHERE customer_id = 1087 AND deleted_at IS NULL;

-- Step 4: Application-level helper (Node.js / Prisma example)
-- In your ORM or query builder, add a default scope:

-- Prisma middleware example:
-- prisma.$use(async (params, next) => {
--   if (params.action === 'findMany' || params.action === 'findFirst') {
--     params.args.where = { ...params.args.where, deleted_at: null };
--   }
--   return next(params);
-- });

-- Step 5: Periodic hard-delete purge (off-peak hours)
-- Run during low-traffic windows (e.g., 3 AM Sunday)
DELETE FROM orders
WHERE deleted_at IS NOT NULL
  AND deleted_at < NOW() - INTERVAL 90 DAY
LIMIT 1000;
-- Use LIMIT to batch — never bulk-delete in one shot
-- Run in a loop with 500ms sleep between batches

When Hard DELETE IS the Right Choice

Soft delete isn't always the answer. Use hard DELETE when:

Scenario	Why Hard DELETE
GDPR / Right to Erasure	Legal requirement — data must be truly gone, not just hidden
Storage-critical systems	Soft delete means rows never shrink. If disk is the bottleneck, you need real deletion
Tables with 0-1 indexes	The index tax is the main cost. No indexes = DELETE is cheap
Staging / dev environments	No concurrency pressure — just delete it
Log / event tables (append-only)	Usually partitioned — drop the partition instead of row-by-row DELETE

For GDPR, the pattern is: soft delete immediately (fast), then run a compliance purge job that does the hard DELETE in batches during off-peak hours. You get fast user-facing response AND true erasure within your SLA.

Common Mistakes Engineers Make

-- MISTAKE 1: Indexing the deleted_at column unnecessarily
CREATE INDEX idx_deleted ON orders (deleted_at);
-- ❌ This defeats the purpose — now UPDATE touches this index too

-- CORRECT: Use partial index or filter in composite indexes
-- PostgreSQL:
CREATE INDEX idx_orders_customer ON orders (customer_id) WHERE deleted_at IS NULL;

-- MISTAKE 2: Bulk DELETE without LIMIT
DELETE FROM orders WHERE status = 'cancelled';
-- ❌ Locks millions of rows in one transaction

-- CORRECT: Batch with LIMIT
DELETE FROM orders WHERE status = 'cancelled' LIMIT 1000;
-- Run in a loop with delay between batches

-- MISTAKE 3: Forgetting to filter soft-deleted rows
SELECT COUNT(*) FROM orders WHERE customer_id = 1087;
-- ❌ Includes soft-deleted rows — wrong count!

-- CORRECT: Always include the filter
SELECT COUNT(*) FROM orders
WHERE customer_id = 1087 AND deleted_at IS NULL;

The Decision Framework

The bottom line

DELETE is the most expensive DML operation in a relational database — not because removing one row is hard, but because the index maintenance cost scales with the number of secondary indexes. Every index doubles the work. In a high-concurrency system, that extra work translates directly into longer lock hold times, more contention, and cascading failures. Soft delete with UPDATE SET deleted_at = NOW() sidesteps the entire index tax. Use it as your default. Batch your real deletes for off-peak hours. Your production database will thank you.