The Scene: A Simple DELETE That Broke Everything
Friday evening. A backend engineer deploys a cleanup job to production:
-- "Just cleaning up cancelled orders, what could go wrong?"
DELETE FROM orders
WHERE status = 'cancelled'
AND created_at < '2024-01-01';
Within minutes:
- API response times jump from 45ms to 3,200ms
- Connection pool saturates — new requests start queuing
- Replica lag spikes to 38 seconds
- Alerts fire across the dashboard
The query deleted 120,000 rows. On a small table, this would take milliseconds. But the orders table had 22M rows, 5 secondary indexes, and 400 concurrent transactions. That simple DELETE turned into a system-wide incident.
Let's understand exactly why.
What DELETE Actually Does Under the Hood
When you run DELETE FROM orders WHERE id = 42, here's what the database engine (InnoDB) actually does:
That's one row. Now multiply by 120,000 rows. And each secondary index removal involves a full B-tree traversal — seek to the leaf page, remove the entry, and potentially merge underfilled pages.
The Index Tax: Why DELETE Is So Expensive
This is the part most engineers miss. The row itself is cheap to remove. The indexes are where the real cost lives.
B-tree Page Merges: The Hidden Killer
When you delete entries from a B-tree index, leaf pages become underfilled. When a page drops below ~50% capacity, InnoDB merges it with a neighbor. This merge operation:
- Locks both pages
- Copies entries from one page to another
- Updates parent pointers
- May cascade upward if the parent becomes underfilled
With bulk deletes, these merges cascade across the entire index tree. This is why your I/O spikes during mass deletions.
Why findOne + UPDATE (Soft Delete) Is Dramatically Cheaper
Now let's look at what happens with a soft delete:
-- Instead of DELETE:
UPDATE orders
SET deleted_at = NOW()
WHERE id = 42;
Here's what the engine does:
The key insight: UPDATE on a non-indexed column doesn't touch any secondary index. The deleted_at column exists only in the clustered index row. InnoDB modifies it in-place — same page, same slot. No B-tree traversals on secondary indexes. No page merges. No cascading rebalancing.
Side-by-Side: DELETE vs Soft Delete Cost
| Metric | Hard DELETE | Soft DELETE (UPDATE) |
|---|---|---|
| Clustered index operation | Remove row, reorganize page | Modify column in-place |
| Secondary index operations | 1 removal per index (5 total) | 0 (column not indexed) |
| B-tree page merges | Likely on every index | None |
| Undo log size | Full row × all indexes | Single column old value |
| Lock hold time | Long (all index ops) | Short (single page update) |
| I/O ops per row (5 indexes) | ~18 | ~3 |
| Replication bytes | Row image + all index changes | Single column change |
| Foreign key cascade check | Yes (for every FK) | No |
The Concurrency Impact: Locks Under Load
This is where the difference becomes critical in production.
DELETE holds the exclusive row lock for the entire duration of all index operations. With 5 secondary indexes, that lock is held 5-6x longer than needed for a simple UPDATE. Under high concurrency, this means:
- Lock wait chains — transactions pile up waiting for the row lock
- Connection pool saturation — waiting transactions hold connections idle
- Replica lag — the replication thread must replay all index operations sequentially
Benchmark: Real Numbers
Test setup: MySQL 8.0, orders table with 10M rows, 5 secondary indexes, 100 concurrent connections, deleting/soft-deleting 50,000 rows.
| Metric | Hard DELETE | Soft DELETE (UPDATE) | Ratio |
|---|---|---|---|
| Total execution time | 14.2s | 2.1s | 6.8x faster |
| Avg lock wait per row | 8.4ms | 0.9ms | 9.3x less |
| Rows scanned in indexes | 300K | 0 | - |
| Redo log bytes written | 248MB | 38MB | 6.5x less |
| Replica lag (peak) | 22s | 3s | 7.3x less |
| InnoDB page merges | 4,200 | 0 | - |
| Deadlocks during test | 12 | 0 | - |
The numbers speak for themselves. DELETE is not just slower — it creates cascading effects (replica lag, deadlocks, page merges) that impact every other query running on the system.
The Implementation Pattern
Here's how to implement soft delete correctly:
-- Step 1: Add the soft delete column
ALTER TABLE orders ADD COLUMN deleted_at DATETIME NULL DEFAULT NULL;
-- Step 2: Add a partial index (PostgreSQL) or generated column index (MySQL)
-- PostgreSQL:
CREATE INDEX idx_orders_active ON orders (id) WHERE deleted_at IS NULL;
-- MySQL (no partial indexes — use a generated column):
ALTER TABLE orders
ADD COLUMN is_active TINYINT GENERATED ALWAYS AS (IF(deleted_at IS NULL, 1, NULL)) STORED;
CREATE INDEX idx_orders_active ON orders (is_active);
-- Step 3: Your queries now filter on deleted_at
-- Before (hard delete world — no filter needed):
SELECT * FROM orders WHERE customer_id = 1087;
-- After (soft delete — add the filter):
SELECT * FROM orders WHERE customer_id = 1087 AND deleted_at IS NULL;
-- Step 4: Application-level helper (Node.js / Prisma example)
-- In your ORM or query builder, add a default scope:
-- Prisma middleware example:
-- prisma.$use(async (params, next) => {
-- if (params.action === 'findMany' || params.action === 'findFirst') {
-- params.args.where = { ...params.args.where, deleted_at: null };
-- }
-- return next(params);
-- });
-- Step 5: Periodic hard-delete purge (off-peak hours)
-- Run during low-traffic windows (e.g., 3 AM Sunday)
DELETE FROM orders
WHERE deleted_at IS NOT NULL
AND deleted_at < NOW() - INTERVAL 90 DAY
LIMIT 1000;
-- Use LIMIT to batch — never bulk-delete in one shot
-- Run in a loop with 500ms sleep between batches
When Hard DELETE IS the Right Choice
Soft delete isn't always the answer. Use hard DELETE when:
| Scenario | Why Hard DELETE |
|---|---|
| GDPR / Right to Erasure | Legal requirement — data must be truly gone, not just hidden |
| Storage-critical systems | Soft delete means rows never shrink. If disk is the bottleneck, you need real deletion |
| Tables with 0-1 indexes | The index tax is the main cost. No indexes = DELETE is cheap |
| Staging / dev environments | No concurrency pressure — just delete it |
| Log / event tables (append-only) | Usually partitioned — drop the partition instead of row-by-row DELETE |
For GDPR, the pattern is: soft delete immediately (fast), then run a compliance purge job that does the hard DELETE in batches during off-peak hours. You get fast user-facing response AND true erasure within your SLA.
Common Mistakes Engineers Make
-- MISTAKE 1: Indexing the deleted_at column unnecessarily
CREATE INDEX idx_deleted ON orders (deleted_at);
-- ❌ This defeats the purpose — now UPDATE touches this index too
-- CORRECT: Use partial index or filter in composite indexes
-- PostgreSQL:
CREATE INDEX idx_orders_customer ON orders (customer_id) WHERE deleted_at IS NULL;
-- MISTAKE 2: Bulk DELETE without LIMIT
DELETE FROM orders WHERE status = 'cancelled';
-- ❌ Locks millions of rows in one transaction
-- CORRECT: Batch with LIMIT
DELETE FROM orders WHERE status = 'cancelled' LIMIT 1000;
-- Run in a loop with delay between batches
-- MISTAKE 3: Forgetting to filter soft-deleted rows
SELECT COUNT(*) FROM orders WHERE customer_id = 1087;
-- ❌ Includes soft-deleted rows — wrong count!
-- CORRECT: Always include the filter
SELECT COUNT(*) FROM orders
WHERE customer_id = 1087 AND deleted_at IS NULL;
The Decision Framework
The bottom line
DELETE is the most expensive DML operation in a relational database — not because removing one row is hard, but because the index maintenance cost scales with the number of secondary indexes. Every index doubles the work. In a high-concurrency system, that extra work translates directly into longer lock hold times, more contention, and cascading failures. Soft delete with UPDATE SET deleted_at = NOW() sidesteps the entire index tax. Use it as your default. Batch your real deletes for off-peak hours. Your production database will thank you.