TCP Keep-Alive vs Heartbeat Ping Two mechanisms. Different layers. Different questions. TCP KEEP-ALIVE (kernel) "Is the route still there?" K kernel X peer dead PROBE (no reply) After 9 probes × 75s RST — route declared dead APP HEARTBEAT (your code) "Is the peer process willing to serve me?" A app A app PING PONG If pong missed N times close + reconnect — app's choice

The naming catastrophe — three things called "keep-alive"

Before anything else: the word "keep-alive" gets used for three completely different things. Most arguments about which is "better" are actually people talking past each other.

Three things called "keep-alive" — what they actually do Connection: keep-alive HTTP/1.1 HEADER Layer: Application (HTTP) Job: reuse the TCP socket across multiple requests Has nothing to do with detecting dead peers. Saves the cost of new handshakes. SO_KEEPALIVE TCP SOCKET OPTION Layer: Transport (kernel) Job: probe the path so dead routes get an RST eventually Default delay: ~2 hours. Tunable per socket. Doesn't know if your app is alive. Heartbeat / Ping APPLICATION MESSAGE Layer: Your code Job: prove the peer process is reading and responding Interval set by you. Traverses NAT, LBs, firewalls. The only one that detects app death.

For the rest of this post, "keep-alive" means the TCP socket option (the middle column). The HTTP header is mentioned only to get it out of the way:

About the HTTP header. When a browser sends Connection: keep-alive, it is asking the server to not close the TCP socket after this response so the next request can reuse it. That avoids a fresh 3-way handshake (and TLS handshake) for every request. In HTTP/1.1 this is the default, in HTTP/2 the connection is multiplexed and reused by design. None of this has anything to do with detecting whether the peer is alive.

How TCP actually tracks "connected"

To understand why long-lived sockets die silently, you need to know one uncomfortable truth about TCP: it has no built-in liveness check by default.

The 3-way handshake establishes a connection — SYN, SYN-ACK, ACK. After that, both sides have a record of the connection, identified by the 4-tuple (src IP, src port, dst IP, dst port). They each track sequence numbers so data isn't reordered or duplicated. The OS marks the socket as ESTABLISHED.

And that's it. The socket can sit in ESTABLISHED forever, even if the other machine has been unplugged for a week. TCP only learns about a problem when something tries to send — then either the peer's TCP stack replies with RST (if it's still around but doesn't recognize the connection), or the local TCP retransmits and eventually times out.

TCP state — the lie of "ESTABLISHED" LISTEN server waits SYN_RCVD handshake ESTABLISHED data flows (or doesn't) FIN_WAIT closing TIME_WAIT cooldown Can sit here forever, even if the peer is gone. Nothing tries to send → nothing notices → "ESTABLISHED" lies.

This is the gap that both TCP keep-alive and application heartbeats are trying to close, in different ways and at different layers.

The half-open socket — how connections silently die

"Half-open" is the term for a TCP connection where one side believes the connection is alive and the other does not (or no longer exists). Four common ways this happens in production:

  1. Peer machine power-cut or kernel panic. No FIN is ever sent. Your side never finds out unless it tries to send.
  2. NAT or stateful firewall drops the flow mapping. Home routers typically expire idle TCP flows around 5 minutes. AWS NLB defaults to 350 seconds. AWS ALB defaults to 60 seconds. Once the mapping is dropped, packets between the two sides get blackholed (NLB) or RST (ALB).
  3. Network path change. Mobile client hands off Wi-Fi to cellular. The 4-tuple's source IP changes, the old socket is orphaned, and the server keeps a zombie.
  4. Middlebox idle-flow eviction. Stateful firewalls and load balancers cap the number of concurrent flows they track. Idle ones get evicted first — silently.

If you SSH into a server running a long-lived socket service and run ss -tan, you'll see something like:

$ ss -tan state established | wc -l
83214

$ ss -tan state established '( dport = :443 )' | head -5
ESTAB  0  0  10.0.1.4:43221  10.0.2.7:443
ESTAB  0  0  10.0.1.4:43227  10.0.2.7:443
ESTAB  0  0  10.0.1.4:43231  10.0.2.7:443

83,214 ESTABLISHED sockets. The kernel is happy. How many of those have a peer that will ever speak again? The kernel has no idea. It has not tried to send anything, so it has not noticed.

A half-open connection — the NAT timeout case CLIENT "connected" NAT / LB flow expired 4m ago SERVER "connected" socket open packets blackholed Both sides see ESTABLISHED. Neither side sends. Nothing detects the break. First write attempt — minutes or hours later — will eventually time out or get an RST.

TCP keep-alive — what the kernel actually does

When you set SO_KEEPALIVE on a socket, the kernel periodically sends a probe packet on idle connections. The probe is a strange little thing: a TCP segment with no payload and a sequence number set to current_seq - 1. The peer's stack sees a duplicate ACK request and answers with the current ACK. If the peer is gone, no answer comes; after enough silence, the kernel declares the connection dead and surfaces an error to your app on the next read or write.

Three knobs control this on Linux:

# Defaults on most Linux distros
$ sysctl net.ipv4.tcp_keepalive_time      # 7200    (idle seconds before first probe)
$ sysctl net.ipv4.tcp_keepalive_intvl     # 75      (seconds between probes)
$ sysctl net.ipv4.tcp_keepalive_probes    # 9       (failed probes before giving up)

Do the math: a freshly-broken connection takes 7200 + (9 × 75) = 7875 seconds, or about 2 hours and 11 minutes, to be detected. That is the default. For anything that matters, you must override per-socket:

// Node.js — first probe after 30s idle
socket.setKeepAlive(true, 30_000);
// Go — first probe after 30s idle
tcpConn.SetKeepAlive(true)
tcpConn.SetKeepAlivePeriod(30 * time.Second)

(Note: setKeepAlive in most high-level runtimes only exposes the idle time, not the probe interval or count. To tune those you use setsockopt with TCP_KEEPIDLE, TCP_KEEPINTVL, TCP_KEEPCNT directly.)

What TCP keep-alive does NOT detect: application deadlock, GC pause, a worker thread stuck in a slow query, an event loop blocked on a CPU-bound task, or any application-layer protocol that's wedged. The kernel's TCP stack is alive and answering probes — but your code might be hung. Kernel responding ≠ app responding.
Keep-alive probe — happy case vs drop detected Healthy connection K K PROBE (seq-1) ACK Connection healthy — reset idle timer Peer gone K X PROBE × 9 (no replies) Socket closed — ETIMEDOUT to app

Application heartbeat — what your code does

An application heartbeat is just a message your protocol defines — sent on a timer, expecting a reply on a timer. The crucial difference from TCP keep-alive is that the heartbeat traverses your application code. To answer it, the peer's event loop must spin, the message must be parsed, and a reply must be written. If the peer process is hung, deadlocked, or mid-GC for too long, the heartbeat goes unanswered — and that's exactly what you wanted to detect.

Three patterns cover almost every case:

  • Ping/pong. Built into the WebSocket protocol (RFC 6455 §5.5.2). The server sends a ping frame; the client's WebSocket library auto-replies with a pong frame. If a pong doesn't arrive in time, the server closes the socket.
  • Periodic empty message. MQTT's PINGREQ/PINGRESP; Kafka's consumer group heartbeat thread. The protocol defines a no-op message specifically for liveness.
  • Read-deadline reset. Every successful read pushes a deadline forward. If the deadline expires with no data, kill the socket. Common in Go (conn.SetReadDeadline) and gRPC (which has its own keepalive subsystem layered on top of HTTP/2).

The canonical Node.js ws library pattern looks like this:

// Server-side heartbeat — the pattern from the ws README
const wss = new WebSocketServer({ port: 8080 });

function heartbeat() {
  this.isAlive = true;
}

wss.on('connection', (ws) => {
  ws.isAlive = true;
  ws.on('pong', heartbeat);   // client replied — mark alive
});

const interval = setInterval(() => {
  wss.clients.forEach((ws) => {
    if (ws.isAlive === false) return ws.terminate(); // missed last round
    ws.isAlive = false;
    ws.ping();                  // send ping; pong handler resets the flag
  });
}, 30_000);

Two things to notice. First, terminate(), not close() — the latter waits for a graceful close handshake the peer can no longer participate in. Second, the design tolerates exactly one missed round before killing the socket, so a single dropped packet doesn't trigger a disconnect.

Heartbeat timeline — one missed pong = close Server Client ping (t=0) pong — isAlive = true ping (t=30s) — isAlive = false (no pong by t=60s) terminate()

The decision framework

This is the load-bearing section. The right answer almost always depends on what's between you and the other side — and what kind of failure you actually need to catch.

ScenarioTCP keep-aliveApp heartbeatWhy
Internal service-to-service, fast LANSometimesRarelyConnections are short-lived; a failed write surfaces RST quickly
HTTP/1.1 keep-alive reuse over LBNoNoLB idle-timeout governs; tune the connection pool's max-idle and reaping
Long-lived gRPC streamsYes (~10s)Yes (gRPC keepalive)gRPC has its own keepalive layer over HTTP/2; tune both
WebSockets through CDN / NATOptionalRequiredCDN/NAT silently drops idle flows; ping interval must be < their idle timeout
MQTT IoT fleetNoRequiredSpec mandates PINGREQ; keep-alive value is negotiated at CONNECT
DB connection poolYes (30–60s)Sometimes (SELECT 1)Cheap detection of stale pool entries before a real query hits one
Behind a strict corporate firewallRequiredRequiredFirewalls drop both kinds; pick whichever the firewall allows

Four rules of thumb:

  • "Is the route alive?" → TCP keep-alive.
  • "Is the peer process alive and processing?" → application heartbeat.
  • "Is there a NAT, LB, or firewall in the middle with an idle timeout?" → application heartbeat at an interval comfortably below that timeout.
  • "Could my app GC-pause for 30s under load?" → tune heartbeat tolerance (how many misses before close), not just frequency. Otherwise a stop-the-world pause kills every healthy connection at once.
The belt-and-suspenders move. For long-lived sockets through middleboxes, configure both: an application heartbeat at, say, 25 seconds (well under typical NAT/LB idle timeouts), and TCP keep-alive at 30–60 seconds as a safety net. The heartbeat catches app-level failures and keeps the flow mapping warm; the kernel's keep-alive catches things that crashed the heartbeat thread itself.

Cost — why you can't just heartbeat every second

Heartbeats look cheap and they mostly are — until they aren't.

Bandwidth math. 100,000 connections × one ~60-byte heartbeat every 30 seconds = ~200 KB/s on the wire. Trivial. Drop the interval to 1 second: 6 MB/s. Still fine for a single host on a 10G NIC.

The real cost isn't bandwidth. It's wakeups. Every heartbeat is a timer firing, an event loop iteration, a syscall to write a few bytes, plus the syscall on the read side when the reply arrives. 100,000 connections at 1Hz heartbeat = 100,000 timer wakeups per second on each side, plus the inverse storm of replies. CPU goes up, latency-sensitive work suffers.

If you need sub-second heartbeats at scale, batch them into a timer wheel (Netty's HashedWheelTimer is the canonical implementation) so a single timer tick wakes up many connections at once. Otherwise, keep the interval as long as the slowest middlebox in your path will tolerate.

Heartbeat cost at 100K connections — bandwidth vs wakeups 60s 30s 10s 5s 1s heartbeat interval (smaller = more frequent) bandwidth wakeups/sec wakeup cost grows much faster than bandwidth

Reference cheatsheet

WhereKnobDefaultWhat it does
Linux kernelnet.ipv4.tcp_keepalive_time7200sIdle seconds before first probe
Linux kernelnet.ipv4.tcp_keepalive_intvl75sSeconds between probes
Linux kernelnet.ipv4.tcp_keepalive_probes9Failed probes before drop
Node.jssocket.setKeepAlive(true, ms)offPer-socket idle time
Goconn.SetKeepAlivePeriod(d)15s on dialerPer-socket idle time
Java/NettyChannelOption.SO_KEEPALIVEoffEnables kernel keep-alive on channel
nginx upstreamkeepalive_time, keepalive_timeout1h / 60sIdle pool reuse window
AWS ALBidle timeout60sDrops idle TCP flows; need heartbeat < 60s
AWS NLBidle timeout350sSame, but at L4 — silent blackhole
WebSocket (RFC 6455)ping/pong framesoffApplication-layer heartbeat at the protocol level
MQTTKeep Alive in CONNECT0 (off)Negotiated PINGREQ interval

The lesson

"Connected" is a lie your kernel tells you by default.

TCP doesn't probe. NAT boxes evict idle flows. Load balancers drop sockets after 60 seconds. Your peer process can hang while its kernel cheerfully answers probes. Pick the layer that answers the question you actually care about — route liveness or peer liveness — and for anything long-lived, configure both. The cheapest debugging session is the one you avoid by setting two timers correctly the first time.