The Incident: Index Bigger Than the Data
The alert came in at 2am: dashboard queries timing out across the platform. P99 had climbed from 40ms to 11 seconds. The DB server's IO wait was pinned at 95%.
The first thing I checked was table sizes:
-- Check table and index sizes in MySQL
SELECT
table_name,
ROUND(data_length / 1024 / 1024 / 1024, 2) AS data_gb,
ROUND(index_length / 1024 / 1024 / 1024, 2) AS index_gb,
ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS total_gb
FROM information_schema.tables
WHERE table_schema = 'production_db'
ORDER BY total_gb DESC
LIMIT 10;
-- Output:
-- table_name data_gb index_gb total_gb
-- device_events 8.20 36.40 44.60 ← 36GB index on 8GB data!
-- user_sessions 2.10 1.80 3.90
-- audit_logs 1.50 0.90 2.40
Diagnosing: What Was in That Index
-- Show all indexes on the table
SHOW INDEX FROM device_events;
-- More detailed view
SELECT
INDEX_NAME,
SEQ_IN_INDEX,
COLUMN_NAME,
SUB_PART, -- prefix length (NULL = full column)
CARDINALITY,
INDEX_TYPE
FROM information_schema.statistics
WHERE table_schema = 'production_db'
AND table_name = 'device_events'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;
-- Output:
-- INDEX_NAME SEQ COLUMN_NAME SUB_PART CARDINALITY
-- PRIMARY 1 event_id NULL 45,000,000
-- idx_device_payload 1 device_id NULL 12,000
-- idx_device_payload 2 event_payload NULL 44,987,432 ← PROBLEM
-- idx_device_payload 3 created_at NULL 8,760
-- idx_created_at 1 created_at NULL 8,760
Found it. The composite index idx_device_payload included event_payload — a TEXT column containing JSON blobs of up to 64KB per row, with no prefix length limit. MySQL was trying to index the entire text content of every event payload.
Three Problems in One Index
Beyond the TEXT column, the composite index had two more issues:
-- Actual queries being run against this table:
SELECT * FROM device_events
WHERE device_id = 42
AND created_at > '2024-01-01'
LIMIT 100;
-- The index order was: (device_id, event_payload, created_at)
-- But queries filter on: device_id (equality) + created_at (range)
-- The event_payload in the middle means:
-- → Index can seek on device_id ✅
-- → But then CANNOT range-scan on created_at ❌
-- because event_payload breaks the ordering
-- EXPLAIN showed:
-- type: ref, key: idx_device_payload, rows: 4,200,000
-- MySQL used the index for device_id but then scanned
-- 4.2M rows to find the created_at range.
The Fix: Rebuild Without Downtime
With 45M rows and production traffic, a standard ALTER TABLE would lock the table for hours. We used pt-online-schema-change (Percona Toolkit) which rebuilds the index on a shadow table and swaps atomically.
-- What we needed to do:
-- 1. Drop the bloated index
-- 2. Create a correct index (device_id, created_at) only
-- 3. If event_payload needs to be searchable, add a prefix index separately
-- The correct replacement indexes:
CREATE INDEX idx_device_date
ON device_events(device_id, created_at);
-- ↑ equality first, range second — correct order
-- If you need to search event_payload (optional, separate):
CREATE INDEX idx_payload_prefix
ON device_events(event_payload(100));
-- ↑ 100-byte prefix only — not the full TEXT
# pt-online-schema-change: rebuild index without table lock
# Works by creating a shadow table, copying data, swapping
pt-online-schema-change \
--host=prod-db-01 \
--user=dba \
--ask-pass \
--alter="
DROP INDEX idx_device_payload,
ADD INDEX idx_device_date (device_id, created_at)
" \
--execute \
--progress=time,30 \
--max-load="Threads_running=50" \
--critical-load="Threads_running=100" \
D=production_db,t=device_events
# --max-load: pause if DB is too busy (safe for prod)
# --critical-load: abort if overloaded
# Takes ~40 minutes for 45M rows. Zero downtime.
Results
How to Prevent This
-- Run this monthly to catch index bloat early
SELECT
table_name,
index_name,
ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 0) AS size_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size'
AND database_name = 'production_db'
ORDER BY size_mb DESC
LIMIT 20;
-- Check for TEXT/BLOB columns in indexes (should always have prefix)
SELECT
s.table_name,
s.index_name,
s.column_name,
s.sub_part AS prefix_length, -- NULL = full column indexed = danger
c.data_type
FROM information_schema.statistics s
JOIN information_schema.columns c
ON s.table_schema = c.table_schema
AND s.table_name = c.table_name
AND s.column_name = c.column_name
WHERE s.table_schema = 'production_db'
AND c.data_type IN ('text', 'blob', 'mediumtext', 'longtext')
AND s.sub_part IS NULL; -- full TEXT in index = red flag
| Rule | Why |
|---|---|
| Never index full TEXT/BLOB columns | Use prefix length: col(100) |
| Equality columns before range columns | Range columns must be last for seeks to work |
| Include only columns used in WHERE/ORDER | Every extra column multiplies index size |
| Monitor index/data size ratio monthly | Catch bloat before it causes incidents |
Use pt-osc for large table changes | Standard ALTER TABLE locks = hours of downtime |
The lesson
Index size is a first-class metric, not an afterthought. An index bigger than your data is a symptom of a design mistake — usually a TEXT/BLOB column with no prefix, or composite columns in the wrong order. Both are invisible in staging where tables are small, and both cause production incidents the moment data volume crosses a threshold.