The Problem: A Gaming Dashboard That Died at Scale
Imagine you're building the backend for a competitive gaming platform — ranked matches, leaderboards, player profiles, match history. Everything works perfectly in staging. You ship to production. Three weeks later, with 8M active players, the dashboard starts timing out.
The DB server CPU hits 90%. Connection pool exhausts. P99 latency jumps from 40ms to 4,200ms. The queries haven't changed. The indexes exist. So what happened?
The root cause? Every query that touched match_date, player_id, or score was written in a way that prevented the index from being used. The team had indexes — they just couldn't reach them.
Mental Model: How a B-Tree Index Actually Works
Before we talk about what breaks indexes, you need to understand what an index is. Most SQL indexes (MySQL InnoDB, PostgreSQL, SQL Server) are B-trees: balanced tree structures where values are stored in sorted order.
The critical insight: the B-tree is sorted by the raw column value. If you apply a function to the column before comparing, the engine no longer knows where to seek in that sorted tree. So it reads every leaf — every row.
What Does "SARGable" Actually Mean?
SARGable = Search ARGument-able. A predicate is SARGable if the database engine can use it to seek or narrow a range in an index structure.
All 8 Patterns: Quick Reference
Below is every common pattern. Use this as a lookup table during code review.
| # | Pattern | Broken example | Fix direction | Production pain |
|---|---|---|---|---|
| 1 | Function on column | YEAR(created_at) = 2024 | Range on raw column | Timeouts, high CPU, slow dashboards |
| 2 | Type mismatch | player_id = '42' (string vs INT) | Cast at API boundary | Random slow lookups, p99 spikes |
| 3 | Math on column | score * 1.1 > 100 | Move math to parameter side | Pricing/scoring queries degrade silently |
| 4 | Leading wildcard | LIKE '%@gmail.com' | Prefix LIKE or full-text search | Search timeouts, DB CPU spikes |
| 5 | OR mixing indexes | status='x' OR updated_at > ? | UNION ALL of selective queries | Unpredictable plans, index not used |
| 6 | NOT IN with subquery | id NOT IN (SELECT …) | NOT EXISTS or LEFT JOIN IS NULL | Nested loops, wrong results with NULLs |
| 7 | NULL comparison | col = NULL | IS NULL | Silent wrong results, then bad "fixes" |
| 8 | Expression without index | col_a * col_b > 100 | Generated column + index | Analytics queries scale linearly with rows |
Now let's go deep on each one.
Pattern 1 — Function on the Column
This is the most common mistake. Someone needs "all matches from 2024" and writes:
-- ❌ NON-SARGABLE: Function wraps the indexed column
SELECT * FROM matches
WHERE YEAR(match_date) = 2024
AND player_id = 42;
Even though match_date has an index, the engine cannot use it efficiently. Here's what EXPLAIN looks like:
-- MySQL EXPLAIN output (broken query)
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | matches | NULL | ALL | idx_match_date| NULL | NULL | NULL | 9876543 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
-- type=ALL means FULL TABLE SCAN. 9.8M rows read for one player's yearly stats.
The fix is to move the calendar logic to the parameter side so the column stays untouched:
-- ✅ SARGABLE: Column is bare, date range is on the parameter side
SELECT * FROM matches
WHERE match_date >= '2024-01-01'
AND match_date < '2025-01-01'
AND player_id = 42;
-- MySQL EXPLAIN output (fixed query)
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
| 1 | SIMPLE | matches | NULL | range | idx_player_match_date | idx_player_match | 12 | NULL | 84 | Using where |
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
-- type=range means index range scan. 84 rows read instead of 9.8M.
Same 116,000x fewer row reads. No schema change — just a rewrite.
| Broken | Fixed |
|---|---|
YEAR(created_at) = 2024 | created_at >= '2024-01-01' AND created_at < '2025-01-01' |
DATE(ts) = '2024-07-04' | ts >= '2024-07-04' AND ts < '2024-07-05' |
MONTH(order_date) = 7 | Range on order_date for the target month |
LOWER(email) = ? | Store normalized email + index it, or use a case-insensitive collation |
ROUND(amount, 2) = ? | Compute a range: amount >= ? - epsilon AND amount <= ? + epsilon |
Pattern 2 — Type Mismatch (Very Common in Node.js APIs)
Express routes give you strings. Your DB column is INT. If you forget to parse, MySQL silently casts every row and loses the index.
// ❌ BROKEN: req.params.id is a string "42", column is INT
const rows = await db.query(
'SELECT * FROM players WHERE player_id = ?',
[req.params.id] // passing string "42", not number 42
);
-- What MySQL actually executes:
-- WHERE player_id = '42'
-- The engine casts every INT row to string for comparison
-- → Full table scan on 10M rows
The fix is a single line at the API boundary:
// ✅ FIXED: Parse and validate before touching the DB
const playerId = Number.parseInt(req.params.id, 10);
if (!Number.isFinite(playerId) || playerId <= 0) {
return res.status(400).json({ error: 'Invalid player ID' });
}
const rows = await db.query(
'SELECT * FROM players WHERE player_id = ?',
[playerId] // number 42, exact type match → index seek
);
This bug is especially painful because it works fine in tests (small datasets hide the cost), and there's no error in your application logs — just silent degradation at scale.
Pattern 3 — Arithmetic on the Column
A leaderboard query that applies a score multiplier in the SQL:
-- ❌ BROKEN: Math is applied to the column
SELECT player_id, score
FROM leaderboard
WHERE score * 1.25 > 1000 -- seasonal bonus multiplier
ORDER BY score DESC
LIMIT 100;
The fix is algebra: move the math to the other side of the operator.
-- ✅ FIXED: Divide the threshold instead of multiplying the column
-- score * 1.25 > 1000 ←→ score > 1000 / 1.25 ←→ score > 800
SELECT player_id, score
FROM leaderboard
WHERE score > 800 -- column stays bare, index seek works
ORDER BY score DESC
LIMIT 100;
Same logic. Same results. The column is untouched, so the B-tree on score can do a range seek instead of scanning every row.
Pattern 4 — Leading Wildcard in LIKE
Searching for players by username suffix or finding emails by domain:
-- ❌ Leading wildcard: no fixed prefix, btree useless
SELECT * FROM players WHERE username LIKE '%shadow%';
SELECT * FROM players WHERE email LIKE '%@company.com';
Prefix LIKE ('shadow%') usually works well with a B-tree index. Suffix or contains searches need a different approach:
| Use case | Solution |
|---|---|
Prefix: LIKE 'abc%' | Standard B-tree index — works fine |
| Full-text search in MySQL | MATCH(col) AGAINST('term' IN BOOLEAN MODE) + FULLTEXT index |
| Trigram search in PostgreSQL | pg_trgm extension + GIN index — supports LIKE '%term%' |
| Production search at scale | Elasticsearch, Typesense, or Meilisearch — purpose-built for this |
Pattern 5 — OR Mixing Different Index Paths
Filtering a match feed by status or recency:
-- ❌ OR forces the planner to choose one index (or none)
SELECT * FROM matches
WHERE status = 'LIVE'
OR updated_at > NOW() - INTERVAL 5 MINUTE;
The optimizer must decide: use the index on status, use the index on updated_at, or scan the whole table. When both predicates are moderately selective, it often picks a full scan because using two separate indexes and merging results is expensive.
-- ✅ UNION ALL: each branch uses its own index efficiently
SELECT * FROM matches WHERE status = 'LIVE'
UNION ALL
SELECT * FROM matches WHERE updated_at > NOW() - INTERVAL 5 MINUTE
AND status != 'LIVE'; -- avoid duplicates explicitly
Always EXPLAIN both versions on production-sized data. The right choice depends on selectivity.
Pattern 6 — NOT IN with a Subquery
Finding players who haven't been banned:
-- ❌ NOT IN with subquery — two problems
SELECT player_id, username
FROM players
WHERE player_id NOT IN (
SELECT player_id FROM banned_players
);
Problem 1 — Performance: The DB evaluates the subquery for every outer row, often resulting in a nested loop. With 10M players and 50k bans, this is catastrophic.
Problem 2 — Correctness: If any row in banned_players has player_id = NULL, the entire NOT IN returns zero rows. This is SQL NULL semantics — x NOT IN (..., NULL, ...) is always UNKNOWN, never TRUE.
-- ✅ NOT EXISTS — correct semantics + better plans on most engines
SELECT p.player_id, p.username
FROM players p
WHERE NOT EXISTS (
SELECT 1
FROM banned_players b
WHERE b.player_id = p.player_id
);
-- ✅ LEFT JOIN / IS NULL — often the fastest option
SELECT p.player_id, p.username
FROM players p
LEFT JOIN banned_players b ON b.player_id = p.player_id
WHERE b.player_id IS NULL;
Pattern 7 — NULL Comparison
This is a SQL semantics issue as much as a performance one:
-- ❌ WRONG: = NULL always returns NULL (never TRUE or FALSE)
SELECT * FROM matches WHERE forfeit_reason = NULL; -- returns 0 rows always
SELECT * FROM matches WHERE forfeit_reason != NULL; -- also returns 0 rows always
-- ✅ CORRECT: IS NULL / IS NOT NULL
SELECT * FROM matches WHERE forfeit_reason IS NULL;
SELECT * FROM matches WHERE forfeit_reason IS NOT NULL;
For index usage: most engines do index IS NULL checks (the null bitmap is part of the index page). However, high-cardinality columns with many NULLs can still produce poor estimates — check EXPLAIN.
Pattern 8 — Expression Without an Index
A ranked-mode query that filters on a composite score formula:
-- ❌ Expression on multiple columns — no standard index covers this
SELECT * FROM players
WHERE (kills * 3 + assists * 1.5 - deaths * 2) > 500;
The fix depends on your engine. The cleanest approach is a generated (computed) column with its own index:
-- PostgreSQL: generated column + index
ALTER TABLE players
ADD COLUMN performance_score NUMERIC
GENERATED ALWAYS AS (kills * 3 + assists * 1.5 - deaths * 2) STORED;
CREATE INDEX idx_perf_score ON players(performance_score);
-- Now this query is a simple index range scan:
SELECT * FROM players WHERE performance_score > 500;
-- MySQL 5.7+: virtual generated column
ALTER TABLE players
ADD COLUMN performance_score DECIMAL(10,2)
GENERATED ALWAYS AS (kills * 3.0 + assists * 1.5 - deaths * 2.0) VIRTUAL,
ADD INDEX idx_perf_score (performance_score);
Reading EXPLAIN: A Practical Field Guide
You can't fix what you can't see. EXPLAIN (or EXPLAIN ANALYZE in Postgres) tells you exactly what the engine did. Here's how to read it for non-SARGable problems:
-- PostgreSQL EXPLAIN ANALYZE output — non-SARGable query
EXPLAIN ANALYZE
SELECT * FROM matches
WHERE EXTRACT(YEAR FROM match_date) = 2024
AND player_id = 42;
-- Output:
Seq Scan on matches (cost=0.00..312847.00 rows=9823 width=156)
(actual time=0.043..4218.291 rows=891 loops=1)
Filter: ((date_part('year'::text, match_date) = 2024) AND (player_id = 42))
Rows Removed by Filter: 9872109
Planning Time: 0.8 ms
Execution Time: 4219.3 ms
-- PostgreSQL EXPLAIN ANALYZE output — SARGable rewrite
EXPLAIN ANALYZE
SELECT * FROM matches
WHERE match_date >= '2024-01-01' AND match_date < '2025-01-01'
AND player_id = 42;
-- Output:
Index Scan using idx_player_match_date on matches
(cost=0.56..124.38 rows=89 width=156)
(actual time=0.091..1.847 rows=891 loops=1)
Index Cond: ((player_id = 42)
AND (match_date >= '2024-01-01')
AND (match_date < '2025-01-01'))
Planning Time: 0.4 ms
Execution Time: 2.1 ms
| What you see in EXPLAIN | What it means | What to check |
|---|---|---|
Seq Scan (PG) / type=ALL (MySQL) | Full table scan — every row read | Is the column bare in WHERE? Types matched? |
Index Scan (PG) / type=ref or range | Index is being used ✅ | Check rows estimate — is it accurate? |
Rows Removed by Filter: 9,872,109 | Index couldn't filter — post-scan filter | Predicate on column is non-SARGable |
key: NULL (MySQL EXPLAIN) | No index was chosen | Wrap in function? Wrong type? OR clause? |
| Actual rows >> estimated rows | Stale statistics | Run ANALYZE (PG) or ANALYZE TABLE (MySQL) |
Nested Loop on large table | Subquery running per outer row | Replace NOT IN with NOT EXISTS or LEFT JOIN |
What Production Actually Looks Like
Non-SARGable queries have a characteristic production fingerprint. If you see all of these together, check your WHERE clauses first:
Index Strategy for the Gaming Dashboard
If you're building something like this schema, here's how to think about composite index design:
-- Gaming dashboard schema (simplified)
CREATE TABLE matches (
match_id BIGINT PRIMARY KEY,
player_id INT NOT NULL,
match_date DATETIME NOT NULL,
status VARCHAR(20) NOT NULL,
score INT NOT NULL,
result ENUM('WIN', 'LOSS', 'DRAW') NOT NULL
);
-- Index for "player's recent matches" (most common query)
-- Composite: player_id first (equality), match_date second (range)
CREATE INDEX idx_player_date ON matches(player_id, match_date);
-- Index for leaderboard by score (range + sort)
CREATE INDEX idx_score_desc ON matches(score DESC);
-- Index for live match polling
CREATE INDEX idx_status_updated ON matches(status, match_date DESC)
WHERE status = 'LIVE'; -- partial index (PostgreSQL)
The rule for composite index column order: equality columns first, range columns last. If your query is WHERE player_id = 42 AND match_date > X, put player_id first — the equality check narrows the search space, then the range scan is within a small subset.
Before Merge: Checklist
- Column is bare in WHERE: No function, no math, no cast wrapping the indexed column. Date ranges as
>=/<on the raw column. - Types match at the API boundary: Parse strings to numbers in Node.js/app code before binding to queries. INT columns get numbers, VARCHAR gets strings.
- EXPLAIN on realistic row counts: Run against a staging DB with production-scale data, or against prod in a read replica. 5k rows hide everything.
- Slow query log is on: In dev/staging with threshold 100ms. You want to catch new regressions before they reach production scale.
- No NOT IN with nullable subquery: Replace with NOT EXISTS or LEFT JOIN IS NULL.
- OR clauses reviewed: If OR spans two different indexes, try UNION ALL and EXPLAIN both.
Something I Got Wrong in Production
On a busy player stats endpoint, I had this in the Node.js service:
// The route handler
app.get('/stats/:playerId', async (req, res) => {
const stats = await db.query(`
SELECT
player_id,
ROUND(AVG(score), 2) as avg_score,
COUNT(*) as total_matches
FROM matches
WHERE player_id = ?
AND YEAR(match_date) = ?
GROUP BY player_id
`, [req.params.playerId, req.query.year]); // ← two bugs here
// ...
});
Two non-SARGable problems in one query: req.params.playerId is a string (type mismatch on INT column), and YEAR(match_date) wraps the indexed date column.
The indexes existed. MySQL just couldn't use them. A single endpoint was responsible for 40% of our DB CPU. The fix took 15 minutes:
app.get('/stats/:playerId', async (req, res) => {
// Parse at the boundary
const playerId = Number.parseInt(req.params.playerId, 10);
const year = Number.parseInt(req.query.year, 10);
if (!Number.isFinite(playerId) || !Number.isFinite(year)) {
return res.status(400).json({ error: 'Invalid parameters' });
}
const yearStart = `${year}-01-01`;
const yearEnd = `${year + 1}-01-01`;
const stats = await db.query(`
SELECT
player_id,
ROUND(AVG(score), 2) as avg_score,
COUNT(*) as total_matches
FROM matches
WHERE player_id = ? -- number, matches INT column
AND match_date >= ? -- column bare, range on parameter
AND match_date < ?
GROUP BY player_id
`, [playerId, yearStart, yearEnd]);
// ...
});
DB CPU dropped from 88% to 12%. P99 went from 3,400ms to 18ms. No schema change, no migration, no new index — just writing the WHERE clause in a way the existing indexes could actually use.
The lesson
Having an index is not the same as the query being able to seek. The query shape determines whether the engine gets to use your B-tree or has to read every row. Every production table eventually grows large enough that non-SARGable predicates become incidents — and the fix is usually a 5-minute rewrite, not a 5-hour migration.