What does "concurrent" actually mean?
Imagine a coffee shop with one barista.
- Sequential: serve customer A fully, then B, then C. If A is slow, B and C wait forever.
- Concurrent: start A's order, while the milk steams, take B's order, hand C their receipt. Many things are in progress at once, even with one person working.
A server is the same. Each user who is logged in and has an open connection (chat, game, live feed) is a customer "still in the shop." The server's job is to keep all of them happy without wasting energy.
The C10K problem
In the late 1990s, servers started hitting a weird wall. They were not out of CPU. They were not out of memory. But once around 10,000 users connected at once, everything slowed to a crawl.
Why it broke: the Apache story
The most popular web server of that era was Apache. Apache's design was simple and honest: fork a new process (or thread) for every incoming connection. A user connects → Apache spawns a worker → that worker handles only that user. Clean, easy to debug.
It worked great for 100 users. It worked okay for 1,000 users. At 10,000 users, it collapsed. Here is exactly why:
- Memory explosion. Each process takes 2–8 MB of RAM just to exist (stack + page tables + loaded code). 10,000 processes × 5 MB = 50 GB of RAM — and you have not handled a single request yet.
- Context switch storm. The CPU can only run one thing at a time per core. To give every process a turn, the OS has to context switch — save the current process's registers, CPU state, and memory mappings, then load the next one's. Each switch costs ~1–10 microseconds. With 10,000 processes competing, the CPU spends more time switching than doing work. This is the real killer.
- Cache thrashing. Every context switch wipes the CPU's L1/L2 cache. The next process starts "cold" and has to refetch everything from RAM (100x slower). More switches = colder caches = slower everything.
- Most workers sleep. At any moment, 9,900 of those 10,000 processes are just sitting there waiting on a slow user's network. You are paying the full memory and scheduling cost for processes that do nothing.
The fix: event loop + non-blocking I/O
Instead of one thread per user, use one (or a few) threads that handle everyone.
The trick is to never let a thread get "stuck" waiting. Normal code does this:
data = read(socket) // thread sleeps here until data arrives
With non-blocking I/O, the thread asks the operating system:
"Tell me which users have new data ready RIGHT NOW."
The OS returns a list. The thread handles them quickly, then asks again. Nobody sits idle. One thread can serve thousands of users because it only does work when work exists.
epoll (Linux), kqueue (BSD/Mac), IOCP (Windows). These are OS features that let your program ask "who is ready?" in a way that scales to millions. You do not have to memorize the names — runtimes like Node.js, Nginx, and Go use them under the hood.Before "hello": the handshakes that set up the connection
Before a user can send a single byte of chat data, two handshakes must happen. Both cost time and server resources — and both are multiplied by however many users are connecting at once.
1) The TCP 3-way handshake
TCP is a "reliable" protocol — it guarantees bytes arrive in order with no gaps. To do that, both sides must first agree they are connected. This is the famous 3-way handshake:
- SYN → client says "I want to start a connection, my starting number is X".
- ← SYN-ACK server says "okay, I got X, my starting number is Y".
- ACK → client says "got Y, we're connected".
Only after this can data flow. On your server, this is where accept() comes in. Your app calls accept() on the listening socket, and the kernel hands back a new file descriptor — that fd represents this one user's connection. Every user gets their own fd.
2) The TLS handshake (for HTTPS / WSS)
If the user is on HTTPS, there's another handshake on top of the TCP one. The client and server exchange certificates, verify identity, and negotiate encryption keys. This takes another 1–2 round trips and burns CPU for the crypto math (RSA or ECDSA verification).
The full journey: from user's keyboard to your app
Now that the connection is established, let's trace what happens when the user actually types "hello" in their chat app and your server reads it. Every step is real work.
Now the same path in plain words, step by step:
- User types "hello" in their chat app on a phone or browser.
- The client's OS wraps it in TCP packets. Each packet gets the destination IP, port, and a sequence number so the receiver can reassemble them in order.
- The packets travel the internet — WiFi → home router → ISP → backbone → your datacenter. This may take 20–200 ms depending on distance.
- Your server's network card (NIC) receives the packets. The NIC copies them into RAM and tells the CPU "new data arrived" (this is called an interrupt).
- The kernel's TCP stack reassembles the packets into the original byte stream and checks for missing pieces. Good bytes go into a socket receive buffer — one buffer per open connection.
- The kernel marks the socket's file descriptor (fd) as "ready to read". An fd is just a number (like 42) that represents one connection. Your app does not know yet — the data is sitting in the kernel, waiting.
- Your app finds out via epoll_wait(). Earlier, your app told the kernel "wake me when any of these sockets have data." Now that call returns with a list: "fd 42 is ready."
- Your app calls read(fd, buffer). The kernel copies the bytes from the socket buffer into your app's memory. Now "hello" is a string in your code.
- Your handler runs. You parse the message, update the chat, maybe write to a database. Done.
epoll_wait() call can watch 10,000 sockets at once and only wake your app for the few that actually have data. That is why one thread can serve thousands of users.read(fd) directly and got stuck sleeping until data arrived on its one socket. 10,000 threads → 10,000 sleeping threads → wasted RAM and CPU switching.Sending the reply back: the write path
Your app does not just read — it also writes back. "Got your hello" needs to travel the same route in reverse:
- Your app builds a response in its own memory (e.g., a JSON string).
- Your app calls write(fd, data). The kernel copies those bytes from user memory into the socket's send buffer.
- The TCP stack chops the bytes into packets, adds headers, and hands them to the NIC driver.
- The NIC sends the packets out onto the wire toward the client.
- TCP tracks acknowledgements. If a packet is lost, the kernel retransmits it — your app never notices.
- The client's OS reassembles and delivers the bytes to the client's app.
The real technical breakthrough: select() → poll() → epoll()
If the event loop idea is so great, why did it take 20 years to get good? Because the tool the kernel gave apps to ask "who is ready?" was terrible at first. The journey from broken to fast is the actual story of C10K.
active fdsHere's the concrete difference. Imagine you have 10,000 open sockets but only 5 have data right now:
- select() / poll(): your app sends all 10,000 fds to the kernel. The kernel loops through all 10,000 checking each one. It returns. Your app loops through all 10,000 again to find which 5 are ready. Work done: 20,000 checks. Useful work: 5.
- epoll(): you registered your 10,000 fds once (long ago). When the kernel gets data for a socket, it silently adds that fd to an internal "ready list". When you call
epoll_wait(), the kernel hands you exactly those 5 ready fds. Work done: 5 items returned. Useful work: 5.
What still goes wrong at C10K
- Async cannot fix slow code. If every request runs 50 database queries, no event loop will save you.
- File descriptor limits. Every open connection uses one file descriptor; the OS has a cap (often 1024 by default). Raise it or you will hit a wall.
- Memory per connection. Each connection still needs buffers. 10,000 connections × 64 KB = 640 MB just for buffers.
The C10M problem
Now imagine you are WhatsApp, Discord, or a big game. You do not have 10,000 users. You have 10 million. Can the same trick work?
No. And this is where people get confused. C10M is not just "C10K with more zeros." It is a different problem entirely.
Why one server cannot do it
Even if your server is a god-tier event loop, one machine has hard limits:
- RAM: 10 million connections × even 10 KB each = 100 GB just for connection state.
- Network card: one NIC can only push so many packets per second.
- CPU: TLS encryption, parsing, and business logic all eat cycles.
- One crash = everyone down. No redundancy.
The fix: split the work across many machines
Real systems handle millions of users by spreading them out. Here is the typical shape:
Each layer solves a specific problem:
- CDN / edge: serves static files close to the user, and handles TLS (HTTPS) so the app servers do not have to decrypt every byte.
- Load balancer: spreads users evenly across app servers. If one app server dies, others take over.
- Many app servers: each one only handles a slice of users. Need more capacity? Add more servers.
- Cache (Redis, Memcached): stores hot data in memory so you do not hit the database for every request.
- Sharded database: split data across many DB servers so no single DB is a bottleneck.
What still goes wrong at C10M
- Thundering herd. You deploy a new version, everyone disconnects, everyone reconnects at once — your own traffic becomes a DDoS on yourself. Fix: reconnect with random delays (backoff + jitter).
- Hot shards. One celebrity on your app has 10 million followers. That shard melts. Fix: special handling for hot users, or duplicate their data.
- Cache stampede. A popular cache entry expires, and 1,000 servers all try to rebuild it at the same time. Fix: locks, or "stale-while-revalidate" caching.
- Cost. Millions of idle connections still cost real money in bandwidth and RAM.
- Tail latency. Your average response time is fine, but 1% of users are waiting 30 seconds because one DB shard is slow. Those users leave.
Side-by-side comparison
| Question | C10K | C10M |
|---|---|---|
| What are we scaling? | One server | The whole system |
| Main problem | Threads waste memory | One machine cannot do it all |
| Main fix | Event loop + non-blocking I/O | Load balancer + many servers + sharded DB |
| You debug | fd limits, event loop blocking | DB hotspots, reconnect storms, p99 latency |
| Beginner mistake | Thinking more threads = faster | Copying Netflix's architecture on day one |
| Year it got popular | ~2000s | ~2010s |
The five things to remember
- C10K is a single-server problem. Fix it with non-blocking I/O and an event loop. Stop making one thread per user.
- C10M is a system-design problem. Fix it with load balancers, many servers, caches, and sharded databases. No coding trick will do this alone.
- Async code does not fix slow databases. If your DB is the bottleneck, epoll will not help.
- Split the work early. Separate TLS, routing, compute, and data into different layers so each can scale on its own.
- Measure before you optimize. Look at RAM per connection, DB query time, and p99 latency. The label (C10K, C10M) matters less than your actual graphs.
Further reading
- Dan Kegel's original "The C10K problem" — the classic article that named the problem.
- Robert Graham's C10M talk (Shmoocon 2013), summarized by Todd Hoff at High Scalability: The Secret to 10 Million Concurrent Connections.
- Try it yourself: write a small chat server in Node.js or Go, then use a tool like
wrkorabto throw thousands of connections at it. Watch where it breaks first — that is your real lesson.