36GB Index Disaster When Your Index Outgrows Your Data 8 GB Data Table 36 GB Index 4.5x larger! 8GB data 36GB index Fixed with zero downtime

The Incident: Index Bigger Than the Data

The alert came in at 2am: dashboard queries timing out across the platform. P99 had climbed from 40ms to 11 seconds. The DB server's IO wait was pinned at 95%.

The first thing I checked was table sizes:

-- Check table and index sizes in MySQL
SELECT
  table_name,
  ROUND(data_length / 1024 / 1024 / 1024, 2) AS data_gb,
  ROUND(index_length / 1024 / 1024 / 1024, 2) AS index_gb,
  ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS total_gb
FROM information_schema.tables
WHERE table_schema = 'production_db'
ORDER BY total_gb DESC
LIMIT 10;
-- Output:
-- table_name        data_gb  index_gb  total_gb
-- device_events       8.20    36.40     44.60   ← 36GB index on 8GB data!
-- user_sessions       2.10     1.80      3.90
-- audit_logs          1.50     0.90      2.40
The Anomaly: device_events Table Actual (bloated) Data: 8.2 GB Index: 36.4 GB (4.4x data!) Healthy ratio Data: 8.2 GB Index: 1.8 GB (22%) Normal ratio: index = 10-50% of data size A 4.4x ratio is a clear signal something is wrong.

Diagnosing: What Was in That Index

-- Show all indexes on the table
SHOW INDEX FROM device_events;

-- More detailed view
SELECT
  INDEX_NAME,
  SEQ_IN_INDEX,
  COLUMN_NAME,
  SUB_PART,       -- prefix length (NULL = full column)
  CARDINALITY,
  INDEX_TYPE
FROM information_schema.statistics
WHERE table_schema = 'production_db'
  AND table_name = 'device_events'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;
-- Output:
-- INDEX_NAME              SEQ  COLUMN_NAME    SUB_PART  CARDINALITY
-- PRIMARY                  1   event_id       NULL      45,000,000
-- idx_device_payload       1   device_id      NULL      12,000
-- idx_device_payload       2   event_payload  NULL      44,987,432  ← PROBLEM
-- idx_device_payload       3   created_at     NULL       8,760
-- idx_created_at           1   created_at     NULL       8,760

Found it. The composite index idx_device_payload included event_payload — a TEXT column containing JSON blobs of up to 64KB per row, with no prefix length limit. MySQL was trying to index the entire text content of every event payload.

Why This Exploded The bad index: CREATE INDEX idx_device_payload ON device_events(device_id, event_payload, created_at); What MySQL stored in the index for each row: device_id 4 bytes event_payload (full TEXT, up to 64KB!) up to 65,536 bytes created_at 8 bytes event_payload = TEXT column with JSON blobs up to 64KB per row, no prefix length 45M rows x avg 800 bytes payload = 36GB index

Three Problems in One Index

Beyond the TEXT column, the composite index had two more issues:

-- Actual queries being run against this table:
SELECT * FROM device_events
WHERE device_id = 42
  AND created_at > '2024-01-01'
LIMIT 100;

-- The index order was: (device_id, event_payload, created_at)
-- But queries filter on: device_id (equality) + created_at (range)
-- The event_payload in the middle means:
-- → Index can seek on device_id ✅
-- → But then CANNOT range-scan on created_at ❌
--   because event_payload breaks the ordering

-- EXPLAIN showed:
-- type: ref, key: idx_device_payload, rows: 4,200,000
-- MySQL used the index for device_id but then scanned
-- 4.2M rows to find the created_at range.
Column Order Matters in Composite Indexes Bad: (device_id, payload, created_at) id=1, payload="aaa", date=2024-01-01 id=1, payload="aab", date=2024-03-15 id=1, payload="zzz", date=2023-12-01 id=2, payload="aaa", date=2024-02-01 Seek on device_id=1 OK created_at NOT in order BAD Scans 3.5M rows Good: (device_id, created_at) id=1, date=2023-12-01 id=1, date=2024-01-01 id=1, date=2024-03-15 id=2, date=2024-02-01 Seek on device_id=1 OK Range scan on created_at OK Reads only ~84 rows WHERE device_id = 1 AND created_at > '2024-01-01' 3,500,000 rows scanned 84 rows scanned vs

The Fix: Rebuild Without Downtime

With 45M rows and production traffic, a standard ALTER TABLE would lock the table for hours. We used pt-online-schema-change (Percona Toolkit) which rebuilds the index on a shadow table and swaps atomically.

-- What we needed to do:
-- 1. Drop the bloated index
-- 2. Create a correct index (device_id, created_at) only
-- 3. If event_payload needs to be searchable, add a prefix index separately

-- The correct replacement indexes:
CREATE INDEX idx_device_date
  ON device_events(device_id, created_at);
--  ↑ equality first, range second — correct order

-- If you need to search event_payload (optional, separate):
CREATE INDEX idx_payload_prefix
  ON device_events(event_payload(100));
--  ↑ 100-byte prefix only — not the full TEXT
# pt-online-schema-change: rebuild index without table lock
# Works by creating a shadow table, copying data, swapping

pt-online-schema-change \
  --host=prod-db-01 \
  --user=dba \
  --ask-pass \
  --alter="
    DROP INDEX idx_device_payload,
    ADD INDEX idx_device_date (device_id, created_at)
  " \
  --execute \
  --progress=time,30 \
  --max-load="Threads_running=50" \
  --critical-load="Threads_running=100" \
  D=production_db,t=device_events

# --max-load: pause if DB is too busy (safe for prod)
# --critical-load: abort if overloaded
# Takes ~40 minutes for 45M rows. Zero downtime.

Results

Before vs After Before After Index size 36.4 GB 0.9 GB Query time 8,200ms 18ms 455x faster Rows scanned 4.2M 84 DB IO wait 95% 12% P99 latency 11,400ms 34ms 35.5 GB storage saved. Zero downtime.

How to Prevent This

-- Run this monthly to catch index bloat early
SELECT
  table_name,
  index_name,
  ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 0) AS size_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size'
  AND database_name = 'production_db'
ORDER BY size_mb DESC
LIMIT 20;

-- Check for TEXT/BLOB columns in indexes (should always have prefix)
SELECT
  s.table_name,
  s.index_name,
  s.column_name,
  s.sub_part AS prefix_length,  -- NULL = full column indexed = danger
  c.data_type
FROM information_schema.statistics s
JOIN information_schema.columns c
  ON s.table_schema = c.table_schema
 AND s.table_name = c.table_name
 AND s.column_name = c.column_name
WHERE s.table_schema = 'production_db'
  AND c.data_type IN ('text', 'blob', 'mediumtext', 'longtext')
  AND s.sub_part IS NULL;  -- full TEXT in index = red flag
RuleWhy
Never index full TEXT/BLOB columnsUse prefix length: col(100)
Equality columns before range columnsRange columns must be last for seeks to work
Include only columns used in WHERE/ORDEREvery extra column multiplies index size
Monitor index/data size ratio monthlyCatch bloat before it causes incidents
Use pt-osc for large table changesStandard ALTER TABLE locks = hours of downtime

The lesson

Index size is a first-class metric, not an afterthought. An index bigger than your data is a symptom of a design mistake — usually a TEXT/BLOB column with no prefix, or composite columns in the wrong order. Both are invisible in staging where tables are small, and both cause production incidents the moment data volume crosses a threshold.