Why Your Indexes Aren't Working Non-SARGable Queries Explained Index Seek root L R FOUND 84 rows Full Table Scan row 1 ... row 2 ... row 3 ... ... 9.8M rows 9.8M rows vs 4,200ms → 2ms Latency reduction 9.8M → 84 rows Rows scanned 0 schema changes Just a WHERE rewrite

The Problem: A Gaming Dashboard That Died at Scale

Imagine you're building the backend for a competitive gaming platform — ranked matches, leaderboards, player profiles, match history. Everything works perfectly in staging. You ship to production. Three weeks later, with 8M active players, the dashboard starts timing out.

The DB server CPU hits 90%. Connection pool exhausts. P99 latency jumps from 40ms to 4,200ms. The queries haven't changed. The indexes exist. So what happened?

Production Incident Timeline Week 1 ~3ms 200k rows Week 3 ~180ms 1.2M rows Week 6 ~4,200ms 8M rows Week 7 DEAD 10M rows The index existed the whole time. The bug was in the WHERE clause — not the schema.

The root cause? Every query that touched match_date, player_id, or score was written in a way that prevented the index from being used. The team had indexes — they just couldn't reach them.

Mental Model: How a B-Tree Index Actually Works

Before we talk about what breaks indexes, you need to understand what an index is. Most SQL indexes (MySQL InnoDB, PostgreSQL, SQL Server) are B-trees: balanced tree structures where values are stored in sorted order.

B-Tree Index on column: created_at 2024-06-15 2024-03-01 2024-09-01 2024-01 2024-02 2024-07 2024-12 row ptr WHERE created_at = '2024-07-04' 1. Start at root → go RIGHT (> 2024-06-15) 2. Go LEFT (< 2024-09-01) → find leaf [2024-07] → row ptr → done ~3 comparisons regardless of table size — O(log n)

The critical insight: the B-tree is sorted by the raw column value. If you apply a function to the column before comparing, the engine no longer knows where to seek in that sorted tree. So it reads every leaf — every row.

Full Table Scan (no index seek possible) WHERE YEAR(created_at) = 2024 Row 1: YEAR(2024-07-04) = 2024 MATCH Row 2: YEAR(2023-12-01) = 2023 SKIP Row 3: YEAR(2024-01-15) = 2024 MATCH ... repeats for ALL 10,000,000 rows ... 10,000,000 reads Every single row O(n) Scales with your data SARGable vs Non-SARGable SARGable Column is bare in WHERE Engine can seek the B-tree O(log n) performance root L R SEEK WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01' -- Index range scan: 84 rows Non-SARGable Function wraps the column Engine must scan every row O(n) performance scan row 1... scan row 2... scan row 3... scan row 4... ... every row ... scan row 9.8M WHERE YEAR(created_at) = 2024 -- Full table scan: 9.8M rows

What Does "SARGable" Actually Mean?

SARGable = Search ARGument-able. A predicate is SARGable if the database engine can use it to seek or narrow a range in an index structure.

SARGable vs Non-SARGable — The Rule SARGable (column is BARE) WHERE created_at >= '2024-01-01' WHERE player_id = 42 WHERE email = 'test@example.com' WHERE score BETWEEN 1000 AND 2000 exact match matching collation range scan Non-SARGable (column is wrapped) WHERE YEAR(created_at) = 2024 WHERE player_id = '42' WHERE score * 1.1 > 100 WHERE LOWER(email) = '...' WHERE email LIKE '%@gmail.com' function type mismatch math on col function leading wildcard The Rule column [operator] expression → SARGable f(column) [operator] anything → Non-SARGable

All 8 Patterns: Quick Reference

Below is every common pattern. Use this as a lookup table during code review.

#PatternBroken exampleFix directionProduction pain
1Function on columnYEAR(created_at) = 2024Range on raw columnTimeouts, high CPU, slow dashboards
2Type mismatchplayer_id = '42' (string vs INT)Cast at API boundaryRandom slow lookups, p99 spikes
3Math on columnscore * 1.1 > 100Move math to parameter sidePricing/scoring queries degrade silently
4Leading wildcardLIKE '%@gmail.com'Prefix LIKE or full-text searchSearch timeouts, DB CPU spikes
5OR mixing indexesstatus='x' OR updated_at > ?UNION ALL of selective queriesUnpredictable plans, index not used
6NOT IN with subqueryid NOT IN (SELECT …)NOT EXISTS or LEFT JOIN IS NULLNested loops, wrong results with NULLs
7NULL comparisoncol = NULLIS NULLSilent wrong results, then bad "fixes"
8Expression without indexcol_a * col_b > 100Generated column + indexAnalytics queries scale linearly with rows

Now let's go deep on each one.

Pattern 1 — Function on the Column

This is the most common mistake. Someone needs "all matches from 2024" and writes:

-- ❌ NON-SARGABLE: Function wraps the indexed column
SELECT * FROM matches
WHERE YEAR(match_date) = 2024
  AND player_id = 42;

Even though match_date has an index, the engine cannot use it efficiently. Here's what EXPLAIN looks like:

-- MySQL EXPLAIN output (broken query)
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table   | partitions | type | possible_keys | key  | key_len | ref  | rows     | Extra       |
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
|  1 | SIMPLE      | matches | NULL       | ALL  | idx_match_date| NULL | NULL    | NULL | 9876543  | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+----------+-------------+
-- type=ALL means FULL TABLE SCAN. 9.8M rows read for one player's yearly stats.

The fix is to move the calendar logic to the parameter side so the column stays untouched:

-- ✅ SARGABLE: Column is bare, date range is on the parameter side
SELECT * FROM matches
WHERE match_date >= '2024-01-01'
  AND match_date <  '2025-01-01'
  AND player_id = 42;
-- MySQL EXPLAIN output (fixed query)
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
| id | select_type | table   | partitions | type  | possible_keys               | key              | key_len | ref   | rows | Extra       |
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
|  1 | SIMPLE      | matches | NULL       | range | idx_player_match_date       | idx_player_match | 12      | NULL  | 84   | Using where |
+----+-------------+---------+------------+-------+-----------------------------+------------------+---------+-------+------+-------------+
-- type=range means index range scan. 84 rows read instead of 9.8M.

Same 116,000x fewer row reads. No schema change — just a rewrite.

BrokenFixed
YEAR(created_at) = 2024created_at >= '2024-01-01' AND created_at < '2025-01-01'
DATE(ts) = '2024-07-04'ts >= '2024-07-04' AND ts < '2024-07-05'
MONTH(order_date) = 7Range on order_date for the target month
LOWER(email) = ?Store normalized email + index it, or use a case-insensitive collation
ROUND(amount, 2) = ?Compute a range: amount >= ? - epsilon AND amount <= ? + epsilon

Pattern 2 — Type Mismatch (Very Common in Node.js APIs)

Express routes give you strings. Your DB column is INT. If you forget to parse, MySQL silently casts every row and loses the index.

// ❌ BROKEN: req.params.id is a string "42", column is INT
const rows = await db.query(
  'SELECT * FROM players WHERE player_id = ?',
  [req.params.id]   // passing string "42", not number 42
);
-- What MySQL actually executes:
-- WHERE player_id = '42'
-- The engine casts every INT row to string for comparison
-- → Full table scan on 10M rows

The fix is a single line at the API boundary:

// ✅ FIXED: Parse and validate before touching the DB
const playerId = Number.parseInt(req.params.id, 10);
if (!Number.isFinite(playerId) || playerId <= 0) {
  return res.status(400).json({ error: 'Invalid player ID' });
}

const rows = await db.query(
  'SELECT * FROM players WHERE player_id = ?',
  [playerId]  // number 42, exact type match → index seek
);
Type Mismatch: What the Engine Does Column type: INT — Index on: player_id (INT) WHERE player_id = '42' Row 1: CAST(1 AS CHAR) = '42'? NO Row 2: CAST(2 AS CHAR) = '42'? NO Row 3: CAST(42 AS CHAR) = '42'? YES ... must check ALL rows to be safe 9,999,999 wasted comparisons Full table scan WHERE player_id = 42 B-tree root SEEK → 42 ~4 comparisons. Done. Index seek

This bug is especially painful because it works fine in tests (small datasets hide the cost), and there's no error in your application logs — just silent degradation at scale.

Pattern 3 — Arithmetic on the Column

A leaderboard query that applies a score multiplier in the SQL:

-- ❌ BROKEN: Math is applied to the column
SELECT player_id, score
FROM leaderboard
WHERE score * 1.25 > 1000  -- seasonal bonus multiplier
ORDER BY score DESC
LIMIT 100;

The fix is algebra: move the math to the other side of the operator.

-- ✅ FIXED: Divide the threshold instead of multiplying the column
-- score * 1.25 > 1000  ←→  score > 1000 / 1.25  ←→  score > 800
SELECT player_id, score
FROM leaderboard
WHERE score > 800        -- column stays bare, index seek works
ORDER BY score DESC
LIMIT 100;

Same logic. Same results. The column is untouched, so the B-tree on score can do a range seek instead of scanning every row.

Pattern 4 — Leading Wildcard in LIKE

Searching for players by username suffix or finding emails by domain:

-- ❌ Leading wildcard: no fixed prefix, btree useless
SELECT * FROM players WHERE username LIKE '%shadow%';
SELECT * FROM players WHERE email    LIKE '%@company.com';
Why Leading Wildcard Kills the Index Index sorted order (username) "aardvark_99" "blade_runner" "cobra_strike" "dark_shadow"pos 5 "electric_eel" "fire_shadow_x"pos 5 "ghost_99" "shadow_blade"pos 0 ... LIKE '%shadow%' "shadow" can appear ANYWHERE in the string No way to narrow the B-tree start position Must check every single row LIKE 'shadow%' "shadow" must be at position 0 (prefix) B-tree seeks to 's' → 'sh' → 'sha' ... Only reads the matching range Prefix LIKE uses the index. Contains/suffix LIKE requires full-text or trigram indexes.

Prefix LIKE ('shadow%') usually works well with a B-tree index. Suffix or contains searches need a different approach:

Use caseSolution
Prefix: LIKE 'abc%'Standard B-tree index — works fine
Full-text search in MySQLMATCH(col) AGAINST('term' IN BOOLEAN MODE) + FULLTEXT index
Trigram search in PostgreSQLpg_trgm extension + GIN index — supports LIKE '%term%'
Production search at scaleElasticsearch, Typesense, or Meilisearch — purpose-built for this

Pattern 5 — OR Mixing Different Index Paths

Filtering a match feed by status or recency:

-- ❌ OR forces the planner to choose one index (or none)
SELECT * FROM matches
WHERE status = 'LIVE'
   OR updated_at > NOW() - INTERVAL 5 MINUTE;

The optimizer must decide: use the index on status, use the index on updated_at, or scan the whole table. When both predicates are moderately selective, it often picks a full scan because using two separate indexes and merging results is expensive.

-- ✅ UNION ALL: each branch uses its own index efficiently
SELECT * FROM matches WHERE status = 'LIVE'
UNION ALL
SELECT * FROM matches WHERE updated_at > NOW() - INTERVAL 5 MINUTE
  AND status != 'LIVE';   -- avoid duplicates explicitly
OR vs UNION ALL — Execution Plan Comparison OR Query Plan Full Table Scan 9.8M rows Filter: status='LIVE' OR updated_at > ? ~48,000ms UNION ALL Plan Index seek: status 2,100 rows Index range: date 340 rows Merge results ~4ms

Always EXPLAIN both versions on production-sized data. The right choice depends on selectivity.

Pattern 6 — NOT IN with a Subquery

Finding players who haven't been banned:

-- ❌ NOT IN with subquery — two problems
SELECT player_id, username
FROM players
WHERE player_id NOT IN (
  SELECT player_id FROM banned_players
);

Problem 1 — Performance: The DB evaluates the subquery for every outer row, often resulting in a nested loop. With 10M players and 50k bans, this is catastrophic.

Problem 2 — Correctness: If any row in banned_players has player_id = NULL, the entire NOT IN returns zero rows. This is SQL NULL semantics — x NOT IN (..., NULL, ...) is always UNKNOWN, never TRUE.

-- ✅ NOT EXISTS — correct semantics + better plans on most engines
SELECT p.player_id, p.username
FROM players p
WHERE NOT EXISTS (
  SELECT 1
  FROM banned_players b
  WHERE b.player_id = p.player_id
);

-- ✅ LEFT JOIN / IS NULL — often the fastest option
SELECT p.player_id, p.username
FROM players p
LEFT JOIN banned_players b ON b.player_id = p.player_id
WHERE b.player_id IS NULL;
NOT IN vs NOT EXISTS — NULL Trap Illustrated banned_players player_id 101 202 NULL NOT IN (101, 202, NULL) Is 999 NOT IN this set? 999 != 101 →TRUE 999 != 202 →TRUE 999 != NULL →UNKNOWN Overall: UNKNOWN (treated as FALSE) 999 excluded — WRONG result! NOT EXISTS Any row where id = 999? No Include 999 NULL poisons NOT IN expressions If any value in the subquery is NULL, NOT IN returns zero rows. Use NOT EXISTS or LEFT JOIN ... IS NULL instead.

Pattern 7 — NULL Comparison

This is a SQL semantics issue as much as a performance one:

-- ❌ WRONG: = NULL always returns NULL (never TRUE or FALSE)
SELECT * FROM matches WHERE forfeit_reason = NULL;   -- returns 0 rows always
SELECT * FROM matches WHERE forfeit_reason != NULL;  -- also returns 0 rows always

-- ✅ CORRECT: IS NULL / IS NOT NULL
SELECT * FROM matches WHERE forfeit_reason IS NULL;
SELECT * FROM matches WHERE forfeit_reason IS NOT NULL;

For index usage: most engines do index IS NULL checks (the null bitmap is part of the index page). However, high-cardinality columns with many NULLs can still produce poor estimates — check EXPLAIN.

Pattern 8 — Expression Without an Index

A ranked-mode query that filters on a composite score formula:

-- ❌ Expression on multiple columns — no standard index covers this
SELECT * FROM players
WHERE (kills * 3 + assists * 1.5 - deaths * 2) > 500;

The fix depends on your engine. The cleanest approach is a generated (computed) column with its own index:

-- PostgreSQL: generated column + index
ALTER TABLE players
ADD COLUMN performance_score NUMERIC
GENERATED ALWAYS AS (kills * 3 + assists * 1.5 - deaths * 2) STORED;

CREATE INDEX idx_perf_score ON players(performance_score);

-- Now this query is a simple index range scan:
SELECT * FROM players WHERE performance_score > 500;
-- MySQL 5.7+: virtual generated column
ALTER TABLE players
ADD COLUMN performance_score DECIMAL(10,2)
  GENERATED ALWAYS AS (kills * 3.0 + assists * 1.5 - deaths * 2.0) VIRTUAL,
ADD INDEX idx_perf_score (performance_score);

Reading EXPLAIN: A Practical Field Guide

You can't fix what you can't see. EXPLAIN (or EXPLAIN ANALYZE in Postgres) tells you exactly what the engine did. Here's how to read it for non-SARGable problems:

-- PostgreSQL EXPLAIN ANALYZE output — non-SARGable query
EXPLAIN ANALYZE
SELECT * FROM matches
WHERE EXTRACT(YEAR FROM match_date) = 2024
  AND player_id = 42;

-- Output:
Seq Scan on matches  (cost=0.00..312847.00 rows=9823 width=156)
                     (actual time=0.043..4218.291 rows=891 loops=1)
  Filter: ((date_part('year'::text, match_date) = 2024) AND (player_id = 42))
  Rows Removed by Filter: 9872109
Planning Time: 0.8 ms
Execution Time: 4219.3 ms
-- PostgreSQL EXPLAIN ANALYZE output — SARGable rewrite
EXPLAIN ANALYZE
SELECT * FROM matches
WHERE match_date >= '2024-01-01' AND match_date < '2025-01-01'
  AND player_id = 42;

-- Output:
Index Scan using idx_player_match_date on matches
  (cost=0.56..124.38 rows=89 width=156)
  (actual time=0.091..1.847 rows=891 loops=1)
  Index Cond: ((player_id = 42)
               AND (match_date >= '2024-01-01')
               AND (match_date < '2025-01-01'))
Planning Time: 0.4 ms
Execution Time: 2.1 ms
What you see in EXPLAINWhat it meansWhat to check
Seq Scan (PG) / type=ALL (MySQL)Full table scan — every row readIs the column bare in WHERE? Types matched?
Index Scan (PG) / type=ref or rangeIndex is being used ✅Check rows estimate — is it accurate?
Rows Removed by Filter: 9,872,109Index couldn't filter — post-scan filterPredicate on column is non-SARGable
key: NULL (MySQL EXPLAIN)No index was chosenWrap in function? Wrong type? OR clause?
Actual rows >> estimated rowsStale statisticsRun ANALYZE (PG) or ANALYZE TABLE (MySQL)
Nested Loop on large tableSubquery running per outer rowReplace NOT IN with NOT EXISTS or LEFT JOIN

What Production Actually Looks Like

Non-SARGable queries have a characteristic production fingerprint. If you see all of these together, check your WHERE clauses first:

Production Symptom Fingerprint Latency: p50 vs p99 4k 3k 2k 1k 0 p50 p99 p50 stays fine, p99 blows up = occasional full scans DB CPU Pattern Business hours 88% Night 30% Fewer concurrent full scans at night App-Side Symptoms "Cannot acquire connection within timeout" Intermittent 502/504 on dashboard endpoints "Query was fine yesterday" — table grew Key Diagnostic Signals p99 diverges from p50 (occasional full scans) CPU high during business hours, drops at night Connection pool exhaustion + intermittent 5xx errors Staging tests pass — small data hides the cost

Index Strategy for the Gaming Dashboard

If you're building something like this schema, here's how to think about composite index design:

-- Gaming dashboard schema (simplified)
CREATE TABLE matches (
  match_id    BIGINT PRIMARY KEY,
  player_id   INT NOT NULL,
  match_date  DATETIME NOT NULL,
  status      VARCHAR(20) NOT NULL,
  score       INT NOT NULL,
  result      ENUM('WIN', 'LOSS', 'DRAW') NOT NULL
);

-- Index for "player's recent matches" (most common query)
-- Composite: player_id first (equality), match_date second (range)
CREATE INDEX idx_player_date ON matches(player_id, match_date);

-- Index for leaderboard by score (range + sort)
CREATE INDEX idx_score_desc ON matches(score DESC);

-- Index for live match polling
CREATE INDEX idx_status_updated ON matches(status, match_date DESC)
  WHERE status = 'LIVE';   -- partial index (PostgreSQL)

The rule for composite index column order: equality columns first, range columns last. If your query is WHERE player_id = 42 AND match_date > X, put player_id first — the equality check narrows the search space, then the range scan is within a small subset.

Composite Index: Column Order Matters Index: (player_id, match_date) Sorted by player_id first, then match_date within each player player_id=1, match_date=2024-01-01 skip player_id=1, match_date=2024-06-15 skip ... player_id=42, match_date=2024-01-10 start player_id=42, match_date=2024-03-22 in range player_id=42, match_date=2024-07-04 in range player_id=42, match_date=2025-01-01 past range player_id=43, match_date=... stop ... 84 rows ~84 rows read, regardless of total table size

Before Merge: Checklist

  1. Column is bare in WHERE: No function, no math, no cast wrapping the indexed column. Date ranges as >= / < on the raw column.
  2. Types match at the API boundary: Parse strings to numbers in Node.js/app code before binding to queries. INT columns get numbers, VARCHAR gets strings.
  3. EXPLAIN on realistic row counts: Run against a staging DB with production-scale data, or against prod in a read replica. 5k rows hide everything.
  4. Slow query log is on: In dev/staging with threshold 100ms. You want to catch new regressions before they reach production scale.
  5. No NOT IN with nullable subquery: Replace with NOT EXISTS or LEFT JOIN IS NULL.
  6. OR clauses reviewed: If OR spans two different indexes, try UNION ALL and EXPLAIN both.
Before & After: Same Indexes, Different WHERE BEFORE DB CPU 88% p99 3,400ms Rows 9.8M scanned YEAR(match_date) = 2024 AFTER DB CPU 12% p99 18ms Rows 84 scanned match_date >= ? AND < ? Same indexes. Same schema. Different WHERE clause.

Something I Got Wrong in Production

On a busy player stats endpoint, I had this in the Node.js service:

// The route handler
app.get('/stats/:playerId', async (req, res) => {
  const stats = await db.query(`
    SELECT
      player_id,
      ROUND(AVG(score), 2) as avg_score,
      COUNT(*) as total_matches
    FROM matches
    WHERE player_id = ?
      AND YEAR(match_date) = ?
    GROUP BY player_id
  `, [req.params.playerId, req.query.year]);  // ← two bugs here
  // ...
});

Two non-SARGable problems in one query: req.params.playerId is a string (type mismatch on INT column), and YEAR(match_date) wraps the indexed date column.

The indexes existed. MySQL just couldn't use them. A single endpoint was responsible for 40% of our DB CPU. The fix took 15 minutes:

app.get('/stats/:playerId', async (req, res) => {
  // Parse at the boundary
  const playerId = Number.parseInt(req.params.playerId, 10);
  const year = Number.parseInt(req.query.year, 10);

  if (!Number.isFinite(playerId) || !Number.isFinite(year)) {
    return res.status(400).json({ error: 'Invalid parameters' });
  }

  const yearStart = `${year}-01-01`;
  const yearEnd   = `${year + 1}-01-01`;

  const stats = await db.query(`
    SELECT
      player_id,
      ROUND(AVG(score), 2) as avg_score,
      COUNT(*) as total_matches
    FROM matches
    WHERE player_id = ?           -- number, matches INT column
      AND match_date >= ?         -- column bare, range on parameter
      AND match_date <  ?
    GROUP BY player_id
  `, [playerId, yearStart, yearEnd]);
  // ...
});

DB CPU dropped from 88% to 12%. P99 went from 3,400ms to 18ms. No schema change, no migration, no new index — just writing the WHERE clause in a way the existing indexes could actually use.

The lesson

Having an index is not the same as the query being able to seek. The query shape determines whether the engine gets to use your B-tree or has to read every row. Every production table eventually grows large enough that non-SARGable predicates become incidents — and the fix is usually a 5-minute rewrite, not a 5-hour migration.