1. What a load balancer actually does
Picture a restaurant with one door and ten kitchens. Orders come in. Someone at the door decides which kitchen cooks each one. That someone is the load balancer.
Its job is simple:
- Spread traffic so no one server drowns.
- If a server dies, send traffic to the rest.
- Add or remove servers without users noticing.
The real question is not whether to load-balance. It's how much the doorman is allowed to read before deciding. That's the L4-vs-L7 split.
2. A tiny bit of networking
You only need three layers to understand this:
When a browser sends GET /login, the request is wrapped like Russian dolls:
- Inside: the HTTP text —
GET /login Host: example.com(Layer 7) - Wrapped in: a TCP segment with ports (Layer 4)
- Wrapped in: an IP packet with addresses (Layer 3)
A load balancer can decide how deep to unwrap before routing. That choice is the whole topic.
3. L4 — the packet forwarder
L4 looks at only four things:
- Source IP + port
- Destination IP + port
- Protocol (TCP or UDP)
That's it. It doesn't know the request is HTTP. It doesn't know there's a URL. It sees bytes going from one IP:port to another and picks a backend.
How it picks a backend
When a new TCP connection arrives, L4 picks a healthy backend (round-robin, least connections, or source-IP hash) and "pins" every later packet of that connection to the same backend:
Microseconds per packet. Millions of packets per second per node.
What L4 is great at
- Any TCP/UDP: databases, Redis, Kafka, gRPC, game servers, DNS, SMTP, WebRTC.
- Throughput: millions of RPS per LB.
- Long-lived connections: raw WebSocket, gRPC streams — stable for hours.
- TLS pass-through: private key stays on your backend; LB never sees plaintext.
What L4 can't do
- Route by URL — it doesn't know one exists.
- Route by host —
api.example.comandwww.example.comlook the same if they share an IP. - Inject headers, rewrite paths, redirect.
- Cookie stickiness — best it can do is hash the source IP, which breaks behind NAT (a whole office looks like one user).
4. L7 — the content-aware proxy
L7 does what L4 refuses to do: it terminates the client's TCP connection at the LB, reads the full HTTP request, and opens a new connection to the backend.
So it's no longer a dumb relay — it's a full HTTP server on one side and a full HTTP client on the other. It can read and route on:
- Method —
GET,POST - Path —
/api/v1/orders/123 - Host —
api.example.com - Query string, cookies, auth headers, User-Agent, anything
What L7 unlocks
- Path routing —
/api/*to one pool,/images/*to another. - Host routing — many services behind one LB, split by
Hostheader. - Cookie stickiness — LB sets its own cookie and pins a user even across IP changes.
- TLS termination — cert lives on the LB; easier rotation, central WAF.
- Smart features — retries, redirects, header injection, rate limits per URL, canary deploys by cookie.
- Edge auth — OIDC/Cognito login handled before the request hits your app.
What L7 costs
- Latency: ~50–300 µs of extra parsing per request.
- HTTP only: you can't put an L7 in front of PostgreSQL.
- More connections: client → LB, LB → backend. Two sockets, two handshakes.
- Price: usually billed by connections + rules + bandwidth, not just bytes.
5. Side-by-side
| Dimension | L4 | L7 |
|---|---|---|
| Sees | IP + port + protocol | Method, URL, headers, cookies, body |
| Protocols | TCP, UDP, TLS pass-through | HTTP/1.1, HTTP/2, HTTP/3, gRPC, WebSocket |
| Routing | By IP / port / hash | By path, host, header, cookie, query |
| Sticky sessions | Source-IP hash (breaks behind NAT) | Cookie-based (reliable) |
| TLS | Pass-through or terminate | Terminates at LB (normal case) |
| Added latency | ~tens of µs | ~hundreds of µs |
| Throughput | Very high (100+ Gbps with DSR) | Lower — CPU-bound on parsing |
| AWS | NLB | ALB, CloudFront |
| Open source | HAProxy (TCP), IPVS, Maglev | Nginx, Envoy, Traefik, HAProxy (HTTP) |
6. When to use which — the simple version
Use L7 when…
- It's an HTTP API or website.
- You want path or host routing (one domain, many services).
- You want canary, blue/green, or weighted deploys.
- You want WAF, redirects, or auth at the edge.
- You need stickiness that survives NAT and mobile IP changes.
Use L4 when…
- Your protocol isn't HTTP — database, Redis, Kafka, SMTP, DNS, game server.
- You need extreme throughput or sub-millisecond p99 latency.
- You need end-to-end TLS where only the backend holds the key.
- You want long-lived raw connections (WebSocket, gRPC streams) with a stable pin.
7. When to use which — advanced scenarios
Once the basics are clear, real systems mix and match. A few patterns worth knowing:
Stack L4 in front of L7
Use this when:
- Static IPs: a partner's firewall allowlist needs fixed IPs — ALB's change, NLB's don't.
- PrivateLink: AWS PrivateLink only accepts an NLB at the front. Put ALB behind it for HTTP routing.
- DDoS / IP-level filtering before smart routing kicks in.
Global → regional → service
Three layers of balancing: pick a region by geo latency, pick a service by URL, pick a pod by hash. Each layer is dumber the closer you get to the metal. That's by design — one smart layer is enough.
Non-HTTP at huge scale
Databases, Redis Cluster, Kafka brokers: L4 only. An HTTP balancer can't parse these protocols. If you need smart routing on top of them (e.g., read/write splitting for Postgres), put a protocol-aware proxy like PgBouncer or ProxySQL at the app — not the network LB.
gRPC
gRPC uses HTTP/2 with long-lived multiplexed streams. Two options:
- L7 with HTTP/2 support (ALB, Envoy) — balances individual RPCs across backends. Best for stateless services.
- L4 pass-through (NLB) — one stream pins to one backend. Simpler, but one hot client = one hot backend.
WebSocket / real-time
Long-lived. Prefer L7 with cookie stickiness so a mobile user flipping between Wi-Fi and 4G lands on the same backend. L4 with IP hash fails here (the IP changes mid-session).
Zero-trust / mTLS everywhere
Service mesh patterns (Istio, Linkerd) run an L7 sidecar next to every pod. The outer LB is usually L4 — let the mesh do the smart stuff. Putting ALB and an L7 mesh in series just doubles the parsing.
8. The AWS map
| AWS product | Layer | Best for | Skip it when |
|---|---|---|---|
| ALB (Application LB) | L7 | HTTP/HTTPS, gRPC, WebSocket. Path/host routing, canary, Cognito auth, WAF integration. | Traffic isn't HTTP. You need static IPs. You need PrivateLink. |
| NLB (Network LB) | L4 | TCP/UDP. Ultra-high throughput. Static Elastic IPs. PrivateLink provider. TLS pass-through. | You want URL/host routing, cookie stickiness, or WAF. |
| GWLB (Gateway LB) | L3/L4 | Inline security appliances — firewalls, IDS/IPS — via GENEVE. | You aren't routing through a security appliance fleet. |
| CloudFront | L7 (edge) | Global HTTP cache + TLS + WAF close to users. Often sits in front of ALB. | Traffic is internal-only or non-HTTP. |
| API Gateway | L7 | REST/HTTP APIs with auth, throttling, usage plans, Lambda integration. | You don't need per-API features — ALB is cheaper for plain HTTP. |
| Route 53 | DNS | Geo / latency / weighted routing across regions. The "pick a region" layer. | You only run in one region. |
| Global Accelerator | L4 (anycast) | Static anycast IPs routing to the nearest healthy region. TCP/UDP. | Pure web traffic — CloudFront usually wins. |
| CLB (Classic LB) | L4 + basic L7 | Legacy only. | Always — use ALB or NLB instead. |
Common AWS shapes
- Plain web app: Route 53 → CloudFront → ALB → ECS/EKS/EC2.
- Public API: Route 53 → API Gateway → Lambda or ALB → service.
- Postgres / Redis cluster inside VPC: NLB → instances (no ALB — not HTTP).
- Multi-region with static IPs: Global Accelerator → regional ALBs.
- SaaS exposed via PrivateLink: customer VPC → NLB → ALB → service.
9. Production gotchas
Client IP preservation
When L7 terminates TCP, the backend sees the LB's IP, not the user's. Fix: trust X-Forwarded-For. In Express: app.set('trust proxy', true).
For NLB, pick one: Proxy Protocol v2 (a tiny header at connection start — your app must parse it) or IP target mode (NLB preserves the source IP transparently). Mismatched config = crashes on the first byte.
Idle timeouts
- ALB: 60s default (tunable up to ~4000s)
- NLB: 350s (not configurable for TCP)
- Nginx: 75s default (
proxy_read_timeout)
Your heartbeat must be shorter than the smallest timeout on the path, or the LB silently kills the connection.
Sticky sessions + WebSocket
Health checks
L4 health = "does TCP accept on port X?" Weak — the app can be returning 500 on every request while TCP still works. L7 health = "does GET /healthz return 200?" Much better. Even behind NLB, expose /healthz.
10. Rough performance numbers
| Metric | L4 (NLB, IPVS) | L7 (ALB, Nginx) |
|---|---|---|
| Added latency (p50) | ~50–200 µs | ~200–500 µs |
| Throughput | Millions of RPS, 100+ Gbps with DSR | ~50–300k RPS per node |
| CPU per request | Near-zero | Moderate (parse + rules + rewrite) |
For most apps these numbers don't matter — your app is the bottleneck, not the LB. They start to matter above ~500k RPS or when chasing single-digit-millisecond p99.
Closing thought
- L4 is a valet who parks your car without reading the registration. Fast. Works for any car.
- L7 is a concierge who reads the name tag and decides which floor you belong on. Slower. Only works if you speak the language they expect.
Pick by the protocol and the routing you actually need. When someone asks "should we put a load balancer there?" the real question is almost always: L4 or L7?