When AUTO_INCREMENT Stops Incrementing — The MySQL INT Overflow Anatomy

Tags: MySQL · Production Incidents · Schema Design · Database · Reading time: ~18 min · Category: Real Problems → Production Incidents

It's 3 AM. Your phone buzzes. The on-call dashboard is red. Inserts to your events table are failing — every single one — and the error in the logs is the most confusing message MySQL can give you:

ERROR 1062 (23000): Duplicate entry '2147483647' for key 'PRIMARY'

You read it twice. You weren't inserting any ID. You weren't trying to write a duplicate. Your app has been calling the same plain INSERT for years. Why is the database suddenly insisting you're a duplicate?

Welcome to one of the most quietly famous failures in any MySQL system. The number 2,147,483,647 is not a coincidence — it is a hard wall built into every table whose primary key uses the default INT type. And when you hit it, MySQL doesn't say "you're out of range." It says "duplicate." That single misleading line is why this outage tends to last a couple of hours instead of fifteen minutes.

Let's walk through the whole thing — what the wall actually is, why the error looks like a lie, how to reproduce it on your laptop in two minutes, and the two mitigation moves that get you back online. One works only if your table is small. The other is the move senior SREs reach for when the table has two billion rows and the clock is ticking.

Where 2,147,483,647 Comes From (The Short Version)

MySQL's INT is 4 bytes. Four bytes is 32 bits. By default it's a signed integer, which means one bit is set aside to remember the sign (positive or negative), and the other 31 bits carry the actual value.

That leaves a maximum value of 2³¹ − 1 = 2,147,483,647. That's the ceiling. There's nothing magical about it — it's pure arithmetic.

  sign │←─────────── 31 value bits ───────────→│
  ┌────┬─────────────────────────────────────────┐
  │ 0  │ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 │
  └────┴─────────────────────────────────────────┘
       = 2,147,483,647    ← the wall

If you flip that sign bit off (use INT UNSIGNED), you reclaim it for the value and get 4,294,967,295 — roughly double. Move to BIGINT (8 bytes instead of 4) and the ceiling becomes 9.2 × 10¹⁸. That's nine quintillion. Functionally, infinity.

The trade-off is small. Going from INT to BIGINT costs you 4 extra bytes per row. On a table with 2 billion rows, that's 8 GB more on the primary key — real, but cheap. Compared to the cost of an outage at 3 AM, it's a rounding error.

Reproduce It on Your Laptop in Two Minutes

You can make MySQL refuse a write yourself. Start a local MySQL instance and paste this in:

CREATE TABLE users (
  id   INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  name VARCHAR(8) NOT NULL
);

INSERT INTO users (name) VALUES ('a');             -- id = 1
INSERT INTO users (name) VALUES ('b');             -- id = 2

-- Now sneak in a row with a much bigger id
INSERT INTO users (id, name) VALUES (100, 'c');    -- id = 100
INSERT INTO users (name) VALUES ('d');             -- id = 101, not 3 !

-- Jump straight to the ceiling
INSERT INTO users (id, name)
VALUES (2147483647, 'cap');                        -- id = 2,147,483,647 ✓

-- One more auto-increment write…
INSERT INTO users (name) VALUES ('boom');
-- ERROR 1062 (23000): Duplicate entry '2147483647' for key 'PRIMARY'

That last line is the production outage in five seconds.

Two small surprises along the way:

One careless insert can fast-forward you by years. When we wrote id = 100, the counter jumped to 101. The next auto-incremented ID was 101, not 3. A single copy-paste backfill from another system can burn through millions of IDs without inserting many rows. This is how plenty of teams accidentally accelerate themselves toward the wall.
The maximum value works exactly once. Writing id = 2147483647 explicitly is fine — that row didn't exist yet. The next auto-increment write tries to use the same number and trips over the row you just inserted.

Why the Error Lies to You

Here's the part that confuses everyone the first time. You'd expect MySQL to tell you "your number is too big." There's a perfectly good error for that: ER_WARN_DATA_OUT_OF_RANGE. It exists. MySQL just doesn't use it here.

To understand why, picture what AUTO_INCREMENT actually does. There's a counter living inside InnoDB for that table. When a new row arrives without an ID, MySQL roughly does this:

candidate = counter
if candidate > column_max:
    candidate = column_max          ← it clamps. it does NOT wrap to negative.
counter = candidate + 1             ← also clamped at the ceiling
INSERT row with id = candidate
if id already exists:
    raise ER_DUP_ENTRY              ← the misleading error you actually see

That "clamp" line is the whole story. When the counter hits the maximum, MySQL doesn't generate a fresh number — it just hands back the maximum value over and over. The first write at id = 2147483647 succeeded. Every later write asks for the same ID. The primary key naturally rejects the second one. From the database's point of view, you're literally trying to insert a duplicate. From your point of view, it's nonsense.

This is why the first hour of the outage is wasted. On-call reads "Duplicate entry '2147483647'" and starts grepping the codebase for places that supply that exact ID. Nothing turns up. They check for race conditions. Nothing. They check cron jobs. Nothing. They check the ORM. Still nothing. Because nothing in the application supplied that ID. The database did, twice, and didn't tell you why.

If you suspect this is what's happening, two queries confirm the diagnosis instantly:

-- What type is the column actually declared as?
SELECT column_name, column_type
  FROM information_schema.columns
 WHERE table_schema = DATABASE()
   AND table_name = 'events'
   AND extra LIKE '%auto_increment%';

-- Where is the counter sitting right now?
SELECT auto_increment
  FROM information_schema.tables
 WHERE table_schema = DATABASE()
   AND table_name   = 'events';

If you see int and the counter at 2147483647 — that's it. You're at the wall. Stop investigating and start mitigating.

Why MySQL Refuses to Wrap Around

In C, when a signed integer overflows it usually wraps from positive to negative. INT_MAX + 1 becomes INT_MIN. Some early databases inherited that behaviour — and the result was a class of bug so quietly destructive that whole forensic projects were spent unwinding it. Imagine your primary key column suddenly producing negative IDs. Every foreign key still points at "valid" rows, but the rows are wrong. Indexes are sorted as expected, but "smaller ID = older row" stops being true. Nothing fails loudly. Everything is just subtly wrong.

MySQL chose the opposite. It clamps and refuses. The error is misleading — but it is also loud, recoverable, and never silently corrupts your data. Of the two failure modes, this is the one you want.

The database isn't your bug. The error code is just bad. Once you know what it means, you're one query away from a clean diagnosis.

A Few AUTO_INCREMENT Facts Worth Knowing

These come up in every outage of this shape.

1. Rollbacks don't return IDs. If your insert is inside a transaction that gets rolled back, the ID it allocated stays burned. The counter has already moved on. A workload with lots of failed inserts — like INSERT … ON DUPLICATE KEY UPDATE with high conflict rates — eats through the ID space faster than your row count suggests.

2. Explicit IDs fast-forward the counter. You saw this in the laptop demo. Insert id = N where N is bigger than the current counter and the counter jumps to N + 1. This is also the trick we'll use in the mitigation — fast-forwarding past the wall on a fresh table.

3. The counter is durable in MySQL 8. In 5.7 and earlier, the counter was rebuilt on restart with SELECT MAX(id) + 1, which meant deleting the highest rows and restarting could quietly reuse old IDs. Modern InnoDB persists the counter in the redo log. If you're still on 5.7, this is one more reason to upgrade.

4. Multi-row inserts can grab a range at once. With innodb_autoinc_lock_mode = 2 (default in 8.0), batch inserts pre-allocate a chunk of IDs for throughput. The trade-off: your ID sequence will have gaps. If you assumed IDs were dense, they aren't.

The Alert That Means You Never See This Article Again

The single most useful sentence in any post-mortem about this kind of outage is also the most boring: set an alert at 70% of the type's maximum, for every AUTO_INCREMENT column you run.

One query gives you the raw data:

SELECT
  t.table_schema,
  t.table_name,
  c.column_name,
  c.column_type,
  t.auto_increment AS next_value,
  CASE
    WHEN c.column_type LIKE 'int%'    AND c.column_type NOT LIKE '%unsigned' THEN 2147483647
    WHEN c.column_type LIKE 'int%'                                           THEN 4294967295
    WHEN c.column_type LIKE 'bigint%' AND c.column_type NOT LIKE '%unsigned' THEN 9223372036854775807
    WHEN c.column_type LIKE 'bigint%'                                        THEN 18446744073709551615
  END AS type_max
FROM information_schema.tables  t
JOIN information_schema.columns c
  ON c.table_schema = t.table_schema
 AND c.table_name   = t.table_name
WHERE t.auto_increment IS NOT NULL
  AND c.extra LIKE '%auto_increment%';

Compute used_ratio = next_value / type_max and ship it to whatever you already use for alerts. Page at 70%. Wake people up at 85%. By the time you're approaching the wall, the migration is a calm Tuesday afternoon — not a 3 AM scramble.

Not every table needs this. A countries table with 200 rows is fine forever. The tables that matter are the noisy ones: event logs, audit trails, IoT telemetry, click streams, queues backed by tables. Those are also the most likely to be on INT because they were created early and never revisited.

Mitigation 1 — Just Widen the Column (If You Can)

If your table is small, the simplest move is to grow the column from INT to BIGINT right where it is:

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET SESSION foreign_key_checks = 0;

ALTER TABLE events
  CHANGE id id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  ALGORITHM = COPY, LOCK = SHARED;

SET SESSION foreign_key_checks = 1;

Each line earns its keep:

Loose isolation — don't let this session get tangled up in other transactions' locks. We're rebuilding a column, not preserving read consistency.
Foreign key checks off — on a big table, FK validation is one of the slowest steps. We'll turn it back on right after.
The actual widening — move id to BIGINT UNSIGNED. If you really can't pay the 4 extra bytes per row, INT UNSIGNED doubles your space. Honestly, just go to BIGINT.
ALGORITHM=COPY, LOCK=SHARED — be explicit. MySQL is going to build a fresh copy of the table with the new column type and swap it in. Reads keep working during the copy; writes are blocked (which doesn't make anything worse — writes were already failing).

The catch: ALGORITHM=COPY means rewriting the entire table on disk. On 100,000 rows that's a few seconds. On 2 billion rows it can be hours to days. If your table is large and you're already in an outage, this is the wrong move. You need something faster. Read on.

Mitigation 2 — The Table Swap

This is the move senior database operators reach for when the table is huge and writes are already failing. Instead of slowly rebuilding the broken table, you stand up a fresh, wider one next to it and atomically swap them in a single statement. The old data stays in the old table and gets migrated on a relaxed schedule.

The whole thing is five steps. Steps 1 through 4 take seconds. Step 5 takes days, but it doesn't matter — by step 4, the outage is already over.

Step 1 — clone the schema (empty table, sub-second)
  CREATE TABLE events_new LIKE events;

Step 2 — widen the id column on the new (empty) table
  ALTER TABLE events_new
    CHANGE id id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT;

Step 3 — fast-forward the counter past the old wall
  ALTER TABLE events_new AUTO_INCREMENT = 2147483648;

Step 4 — atomic swap: one statement, one metadata lock
  RENAME TABLE
    events     TO events_old,
    events_new TO events;

Step 5 — backfill old rows on a relaxed schedule (days, in chunks)
  INSERT INTO events (id, …) SELECT id, … FROM events_old WHERE id BETWEEN ? AND ?

Why every step is cheap until the rename:

Step 1 — instant. Cloning an empty schema is metadata only. No rows.
Step 2 — instant. Changing a column type on an empty table is also metadata only. Nothing to rewrite.
Step 3 — instant. We move the counter to 2,147,483,648 — exactly one past the old wall. Setting the counter on an empty table is metadata too.
Step 4 — the magic. A multi-table RENAME in MySQL is atomic. Both renames happen under a single metadata lock and commit together. From the application's point of view, the table called "events" went from broken to empty-and-wide in one indivisible blink. There is no moment where the table doesn't exist.

The instant Step 4 commits, new inserts start succeeding. They land at id = 2147483648 and climb from there, safely inside the much larger BIGINT column. The outage is over for live traffic in seconds.

The Backfill, Patiently

Step 5 happens after the outage is over and your phone has stopped buzzing. You move the old rows over in small batches that don't blow up replication lag or the buffer pool:

-- repeat with rolling id windows; tune chunk size to your replica lag
INSERT INTO events (id, ts, user_id, payload)
SELECT      id, ts, user_id, payload
FROM        events_old
WHERE       id BETWEEN 1         AND 1000000
ORDER BY    id;

The old IDs (1 through 2,147,483,647) move into the new table in their original positions. New IDs (2,147,483,648 onwards) keep growing. Nothing collides, because the two ranges never overlap — that was the whole point of Step 3.

When the backfill finishes, you drop events_old and the migration is done.

The Sharp Edges Nobody Warns You About

On paper the swap is beautiful. In production it has teeth. Here's what catches teams.

Foreign keys pointing at the old table. If anywhere in the schema you have FOREIGN KEY (event_id) REFERENCES events(id), the rename can either fail or silently follow along depending on engine and version. Audit your schema before swapping. You may need to temporarily drop and re-add the constraints. (One reason high-injection tables in mature systems often don't have inbound FKs at all — they make operations like this dramatically less painful.)

Reads need both ranges during the migration. Between Step 4 and the end of Step 5, a query like SELECT * FROM events WHERE user_id = ? only sees the new rows. Old rows are still over in events_old. Two options: replace the table with a UNION ALL view that joins both, or teach the application to read from both tables temporarily. Either is fine — just don't be surprised.

Replication lag during the backfill. If you naively run one giant INSERT … SELECT over 2 billion rows, your replicas will fall behind by hours. Chunk it. Check seconds_behind_master between chunks. Sleep when lag rises. There's no shortcut — just patience.

Triggers that run on every insert. If you have AFTER INSERT triggers on the table, they will fire on every single backfilled row. Two billion times. Disable them during the backfill and replay anything they would have done in batch.

ORMs that cached the column type. Some ORMs read the schema once at connection time. After the swap your id is BIGINT, but the connection still believes it's INT. The symptom is silent truncation of large new IDs. Restart the connection pool after the swap — most pools pick up the new type on fresh connections.

What to Ship From Day One

Every story like this ends with the same lesson: choosing the right ID type up front costs essentially nothing, while choosing the wrong one costs an outage. So default to these:

If your table is…	Use this for the primary key	Why
Small, scoped (lookups, configs)	`INT UNSIGNED`	4.3 billion is more than enough; column stays narrow.
High-injection (events, logs, IoT)	`BIGINT UNSIGNED AUTO_INCREMENT`	You will never hit the ceiling. 4 extra bytes per row is the price of sleep.
Distributed, multi-region writes	Snowflake or UUID v7	No single counter to coordinate; time-ordered so B-trees stay happy.
Externally exposed IDs	UUID v7 or opaque hash	AUTO_INCREMENT IDs in URLs leak how many rows you have and how fast you're growing.

If you want the deeper comparison — page splits, B-tree behaviour, why random UUIDs blow up your buffer pool — there's a companion post: UUID v4 vs UUID v7 vs Snowflake — Pick the Right Primary Key.

The honest summary. The 32-bit signed AUTO_INCREMENT wall is the consequence of one default that was sensible in 1995 and is no longer sensible in 2026. The fix is one line of DDL on day one. If the wall is already in sight, the cheapest move is the swap. And the cheapest future move is to never see this wall again.

One Thing to Remember

Every AUTO_INCREMENT column in your production database has a ceiling. The ceiling is silent — right up until the moment it isn't. One query in a Prometheus exporter, one paging alert at 70% of the type's maximum, and you will never see this outage in your career.

By the time you cross 70%, the migration is a planned change with a runbook and a quiet Tuesday afternoon. Without the alert, the same migration is a Saturday night with a misleading error, an hour of confused on-call, and a slow realisation that the id column on your busiest table was declared a decade ago and never looked at again.

You decide which Tuesday you want.