MySQL Query Optimization: Real Lessons from a 36GB Index Disaster

The Incident: Index Bigger Than the Data

The alert came in at 2am: dashboard queries timing out across the platform. P99 had climbed from 40ms to 11 seconds. The DB server's IO wait was pinned at 95%.

The first thing I checked was table sizes:

-- Check table and index sizes in MySQL
SELECT
  table_name,
  ROUND(data_length / 1024 / 1024 / 1024, 2) AS data_gb,
  ROUND(index_length / 1024 / 1024 / 1024, 2) AS index_gb,
  ROUND((data_length + index_length) / 1024 / 1024 / 1024, 2) AS total_gb
FROM information_schema.tables
WHERE table_schema = 'production_db'
ORDER BY total_gb DESC
LIMIT 10;

-- Output:
-- table_name        data_gb  index_gb  total_gb
-- device_events       8.20    36.40     44.60   ← 36GB index on 8GB data!
-- user_sessions       2.10     1.80      3.90
-- audit_logs          1.50     0.90      2.40

Diagnosing: What Was in That Index

-- Show all indexes on the table
SHOW INDEX FROM device_events;

-- More detailed view
SELECT
  INDEX_NAME,
  SEQ_IN_INDEX,
  COLUMN_NAME,
  SUB_PART,       -- prefix length (NULL = full column)
  CARDINALITY,
  INDEX_TYPE
FROM information_schema.statistics
WHERE table_schema = 'production_db'
  AND table_name = 'device_events'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;

-- Output:
-- INDEX_NAME              SEQ  COLUMN_NAME    SUB_PART  CARDINALITY
-- PRIMARY                  1   event_id       NULL      45,000,000
-- idx_device_payload       1   device_id      NULL      12,000
-- idx_device_payload       2   event_payload  NULL      44,987,432  ← PROBLEM
-- idx_device_payload       3   created_at     NULL       8,760
-- idx_created_at           1   created_at     NULL       8,760

Found it. The composite index idx_device_payload included event_payload — a TEXT column containing JSON blobs of up to 64KB per row, with no prefix length limit. MySQL was trying to index the entire text content of every event payload.

Three Problems in One Index

Beyond the TEXT column, the composite index had two more issues:

-- Actual queries being run against this table:
SELECT * FROM device_events
WHERE device_id = 42
  AND created_at > '2024-01-01'
LIMIT 100;

-- The index order was: (device_id, event_payload, created_at)
-- But queries filter on: device_id (equality) + created_at (range)
-- The event_payload in the middle means:
-- → Index can seek on device_id ✅
-- → But then CANNOT range-scan on created_at ❌
--   because event_payload breaks the ordering

-- EXPLAIN showed:
-- type: ref, key: idx_device_payload, rows: 4,200,000
-- MySQL used the index for device_id but then scanned
-- 4.2M rows to find the created_at range.

The Fix: Rebuild Without Downtime

With 45M rows and production traffic, a standard ALTER TABLE would lock the table for hours. We used pt-online-schema-change (Percona Toolkit) which rebuilds the index on a shadow table and swaps atomically.

-- What we needed to do:
-- 1. Drop the bloated index
-- 2. Create a correct index (device_id, created_at) only
-- 3. If event_payload needs to be searchable, add a prefix index separately

-- The correct replacement indexes:
CREATE INDEX idx_device_date
  ON device_events(device_id, created_at);
--  ↑ equality first, range second — correct order

-- If you need to search event_payload (optional, separate):
CREATE INDEX idx_payload_prefix
  ON device_events(event_payload(100));
--  ↑ 100-byte prefix only — not the full TEXT

# pt-online-schema-change: rebuild index without table lock
# Works by creating a shadow table, copying data, swapping

pt-online-schema-change \
  --host=prod-db-01 \
  --user=dba \
  --ask-pass \
  --alter="
    DROP INDEX idx_device_payload,
    ADD INDEX idx_device_date (device_id, created_at)
  " \
  --execute \
  --progress=time,30 \
  --max-load="Threads_running=50" \
  --critical-load="Threads_running=100" \
  D=production_db,t=device_events

# --max-load: pause if DB is too busy (safe for prod)
# --critical-load: abort if overloaded
# Takes ~40 minutes for 45M rows. Zero downtime.

Results

How to Prevent This

-- Run this monthly to catch index bloat early
SELECT
  table_name,
  index_name,
  ROUND(stat_value * @@innodb_page_size / 1024 / 1024, 0) AS size_mb
FROM mysql.innodb_index_stats
WHERE stat_name = 'size'
  AND database_name = 'production_db'
ORDER BY size_mb DESC
LIMIT 20;

-- Check for TEXT/BLOB columns in indexes (should always have prefix)
SELECT
  s.table_name,
  s.index_name,
  s.column_name,
  s.sub_part AS prefix_length,  -- NULL = full column indexed = danger
  c.data_type
FROM information_schema.statistics s
JOIN information_schema.columns c
  ON s.table_schema = c.table_schema
 AND s.table_name = c.table_name
 AND s.column_name = c.column_name
WHERE s.table_schema = 'production_db'
  AND c.data_type IN ('text', 'blob', 'mediumtext', 'longtext')
  AND s.sub_part IS NULL;  -- full TEXT in index = red flag

Rule	Why
Never index full TEXT/BLOB columns	Use prefix length: `col(100)`
Equality columns before range columns	Range columns must be last for seeks to work
Include only columns used in WHERE/ORDER	Every extra column multiplies index size
Monitor index/data size ratio monthly	Catch bloat before it causes incidents
Use `pt-osc` for large table changes	Standard ALTER TABLE locks = hours of downtime

The lesson

Index size is a first-class metric, not an afterthought. An index bigger than your data is a symptom of a design mistake — usually a TEXT/BLOB column with no prefix, or composite columns in the wrong order. Both are invisible in staging where tables are small, and both cause production incidents the moment data volume crosses a threshold.