The naming catastrophe — three things called "keep-alive"
Before anything else: the word "keep-alive" gets used for three completely different things. Most arguments about which is "better" are actually people talking past each other.
For the rest of this post, "keep-alive" means the TCP socket option (the middle column). The HTTP header is mentioned only to get it out of the way:
Connection: keep-alive, it is asking the server to not close the TCP socket after this response so the next request can reuse it. That avoids a fresh 3-way handshake (and TLS handshake) for every request. In HTTP/1.1 this is the default, in HTTP/2 the connection is multiplexed and reused by design. None of this has anything to do with detecting whether the peer is alive.How TCP actually tracks "connected"
To understand why long-lived sockets die silently, you need to know one uncomfortable truth about TCP: it has no built-in liveness check by default.
The 3-way handshake establishes a connection — SYN, SYN-ACK, ACK. After that, both sides have a record of the connection, identified by the 4-tuple (src IP, src port, dst IP, dst port). They each track sequence numbers so data isn't reordered or duplicated. The OS marks the socket as ESTABLISHED.
And that's it. The socket can sit in ESTABLISHED forever, even if the other machine has been unplugged for a week. TCP only learns about a problem when something tries to send — then either the peer's TCP stack replies with RST (if it's still around but doesn't recognize the connection), or the local TCP retransmits and eventually times out.
This is the gap that both TCP keep-alive and application heartbeats are trying to close, in different ways and at different layers.
The half-open socket — how connections silently die
"Half-open" is the term for a TCP connection where one side believes the connection is alive and the other does not (or no longer exists). Four common ways this happens in production:
- Peer machine power-cut or kernel panic. No FIN is ever sent. Your side never finds out unless it tries to send.
- NAT or stateful firewall drops the flow mapping. Home routers typically expire idle TCP flows around 5 minutes. AWS NLB defaults to 350 seconds. AWS ALB defaults to 60 seconds. Once the mapping is dropped, packets between the two sides get blackholed (NLB) or RST (ALB).
- Network path change. Mobile client hands off Wi-Fi to cellular. The 4-tuple's source IP changes, the old socket is orphaned, and the server keeps a zombie.
- Middlebox idle-flow eviction. Stateful firewalls and load balancers cap the number of concurrent flows they track. Idle ones get evicted first — silently.
If you SSH into a server running a long-lived socket service and run ss -tan, you'll see something like:
$ ss -tan state established | wc -l
83214
$ ss -tan state established '( dport = :443 )' | head -5
ESTAB 0 0 10.0.1.4:43221 10.0.2.7:443
ESTAB 0 0 10.0.1.4:43227 10.0.2.7:443
ESTAB 0 0 10.0.1.4:43231 10.0.2.7:443
83,214 ESTABLISHED sockets. The kernel is happy. How many of those have a peer that will ever speak again? The kernel has no idea. It has not tried to send anything, so it has not noticed.
TCP keep-alive — what the kernel actually does
When you set SO_KEEPALIVE on a socket, the kernel periodically sends a probe packet on idle connections. The probe is a strange little thing: a TCP segment with no payload and a sequence number set to current_seq - 1. The peer's stack sees a duplicate ACK request and answers with the current ACK. If the peer is gone, no answer comes; after enough silence, the kernel declares the connection dead and surfaces an error to your app on the next read or write.
Three knobs control this on Linux:
# Defaults on most Linux distros
$ sysctl net.ipv4.tcp_keepalive_time # 7200 (idle seconds before first probe)
$ sysctl net.ipv4.tcp_keepalive_intvl # 75 (seconds between probes)
$ sysctl net.ipv4.tcp_keepalive_probes # 9 (failed probes before giving up)
Do the math: a freshly-broken connection takes 7200 + (9 × 75) = 7875 seconds, or about 2 hours and 11 minutes, to be detected. That is the default. For anything that matters, you must override per-socket:
// Node.js — first probe after 30s idle
socket.setKeepAlive(true, 30_000);
// Go — first probe after 30s idle
tcpConn.SetKeepAlive(true)
tcpConn.SetKeepAlivePeriod(30 * time.Second)
(Note: setKeepAlive in most high-level runtimes only exposes the idle time, not the probe interval or count. To tune those you use setsockopt with TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT directly.)
Application heartbeat — what your code does
An application heartbeat is just a message your protocol defines — sent on a timer, expecting a reply on a timer. The crucial difference from TCP keep-alive is that the heartbeat traverses your application code. To answer it, the peer's event loop must spin, the message must be parsed, and a reply must be written. If the peer process is hung, deadlocked, or mid-GC for too long, the heartbeat goes unanswered — and that's exactly what you wanted to detect.
Three patterns cover almost every case:
- Ping/pong. Built into the WebSocket protocol (RFC 6455 §5.5.2). The server sends a ping frame; the client's WebSocket library auto-replies with a pong frame. If a pong doesn't arrive in time, the server closes the socket.
- Periodic empty message. MQTT's
PINGREQ/PINGRESP; Kafka's consumer group heartbeat thread. The protocol defines a no-op message specifically for liveness. - Read-deadline reset. Every successful read pushes a deadline forward. If the deadline expires with no data, kill the socket. Common in Go (
conn.SetReadDeadline) and gRPC (which has its own keepalive subsystem layered on top of HTTP/2).
The canonical Node.js ws library pattern looks like this:
// Server-side heartbeat — the pattern from the ws README
const wss = new WebSocketServer({ port: 8080 });
function heartbeat() {
this.isAlive = true;
}
wss.on('connection', (ws) => {
ws.isAlive = true;
ws.on('pong', heartbeat); // client replied — mark alive
});
const interval = setInterval(() => {
wss.clients.forEach((ws) => {
if (ws.isAlive === false) return ws.terminate(); // missed last round
ws.isAlive = false;
ws.ping(); // send ping; pong handler resets the flag
});
}, 30_000);
Two things to notice. First, terminate(), not close() — the latter waits for a graceful close handshake the peer can no longer participate in. Second, the design tolerates exactly one missed round before killing the socket, so a single dropped packet doesn't trigger a disconnect.
The decision framework
This is the load-bearing section. The right answer almost always depends on what's between you and the other side — and what kind of failure you actually need to catch.
| Scenario | TCP keep-alive | App heartbeat | Why |
|---|---|---|---|
| Internal service-to-service, fast LAN | Sometimes | Rarely | Connections are short-lived; a failed write surfaces RST quickly |
| HTTP/1.1 keep-alive reuse over LB | No | No | LB idle-timeout governs; tune the connection pool's max-idle and reaping |
| Long-lived gRPC streams | Yes (~10s) | Yes (gRPC keepalive) | gRPC has its own keepalive layer over HTTP/2; tune both |
| WebSockets through CDN / NAT | Optional | Required | CDN/NAT silently drops idle flows; ping interval must be < their idle timeout |
| MQTT IoT fleet | No | Required | Spec mandates PINGREQ; keep-alive value is negotiated at CONNECT |
| DB connection pool | Yes (30–60s) | Sometimes (SELECT 1) | Cheap detection of stale pool entries before a real query hits one |
| Behind a strict corporate firewall | Required | Required | Firewalls drop both kinds; pick whichever the firewall allows |
Four rules of thumb:
- "Is the route alive?" → TCP keep-alive.
- "Is the peer process alive and processing?" → application heartbeat.
- "Is there a NAT, LB, or firewall in the middle with an idle timeout?" → application heartbeat at an interval comfortably below that timeout.
- "Could my app GC-pause for 30s under load?" → tune heartbeat tolerance (how many misses before close), not just frequency. Otherwise a stop-the-world pause kills every healthy connection at once.
Cost — why you can't just heartbeat every second
Heartbeats look cheap and they mostly are — until they aren't.
Bandwidth math. 100,000 connections × one ~60-byte heartbeat every 30 seconds = ~200 KB/s on the wire. Trivial. Drop the interval to 1 second: 6 MB/s. Still fine for a single host on a 10G NIC.
The real cost isn't bandwidth. It's wakeups. Every heartbeat is a timer firing, an event loop iteration, a syscall to write a few bytes, plus the syscall on the read side when the reply arrives. 100,000 connections at 1Hz heartbeat = 100,000 timer wakeups per second on each side, plus the inverse storm of replies. CPU goes up, latency-sensitive work suffers.
If you need sub-second heartbeats at scale, batch them into a timer wheel (Netty's HashedWheelTimer is the canonical implementation) so a single timer tick wakes up many connections at once. Otherwise, keep the interval as long as the slowest middlebox in your path will tolerate.
Reference cheatsheet
| Where | Knob | Default | What it does |
|---|---|---|---|
| Linux kernel | net.ipv4.tcp_keepalive_time | 7200s | Idle seconds before first probe |
| Linux kernel | net.ipv4.tcp_keepalive_intvl | 75s | Seconds between probes |
| Linux kernel | net.ipv4.tcp_keepalive_probes | 9 | Failed probes before drop |
| Node.js | socket.setKeepAlive(true, ms) | off | Per-socket idle time |
| Go | conn.SetKeepAlivePeriod(d) | 15s on dialer | Per-socket idle time |
| Java/Netty | ChannelOption.SO_KEEPALIVE | off | Enables kernel keep-alive on channel |
| nginx upstream | keepalive_time, keepalive_timeout | 1h / 60s | Idle pool reuse window |
| AWS ALB | idle timeout | 60s | Drops idle TCP flows; need heartbeat < 60s |
| AWS NLB | idle timeout | 350s | Same, but at L4 — silent blackhole |
| WebSocket (RFC 6455) | ping/pong frames | off | Application-layer heartbeat at the protocol level |
| MQTT | Keep Alive in CONNECT | 0 (off) | Negotiated PINGREQ interval |
The lesson
"Connected" is a lie your kernel tells you by default.
TCP doesn't probe. NAT boxes evict idle flows. Load balancers drop sockets after 60 seconds. Your peer process can hang while its kernel cheerfully answers probes. Pick the layer that answers the question you actually care about — route liveness or peer liveness — and for anything long-lived, configure both. The cheapest debugging session is the one you avoid by setting two timers correctly the first time.