Load Balancer Explained: L4 vs L7 — What the Door-Person Actually Sees

1. What a load balancer actually does

Picture a restaurant with one door and ten kitchens. Orders come in. Someone at the door decides which kitchen cooks each one. That someone is the load balancer.

Its job is simple:

Spread traffic so no one server drowns.
If a server dies, send traffic to the rest.
Add or remove servers without users noticing.

The real question is not whether to load-balance. It's how much the doorman is allowed to read before deciding. That's the L4-vs-L7 split.

2. A tiny bit of networking

You only need three layers to understand this:

Layer 7 — Application | HTTP, gRPC, WebSocket (what humans see) Layer 4 — Transport | TCP, UDP (how bytes travel) Layer 3 — Network | IP (which machine)

When a browser sends GET /login, the request is wrapped like Russian dolls:

Inside: the HTTP text — GET /login Host: example.com (Layer 7)
Wrapped in: a TCP segment with ports (Layer 4)
Wrapped in: an IP packet with addresses (Layer 3)

A load balancer can decide how deep to unwrap before routing. That choice is the whole topic.

3. L4 — the packet forwarder

L4 looks at only four things:

Source IP + port
Destination IP + port
Protocol (TCP or UDP)

That's it. It doesn't know the request is HTTP. It doesn't know there's a URL. It sees bytes going from one IP:port to another and picks a backend.

How it picks a backend

When a new TCP connection arrives, L4 picks a healthy backend (round-robin, least connections, or source-IP hash) and "pins" every later packet of that connection to the same backend:

client 203.0.113.7:51422 → backend 10.0.1.5:8080 client 203.0.113.8:49002 → backend 10.0.1.6:8080 client 203.0.113.9:60112 → backend 10.0.1.5:8080

Microseconds per packet. Millions of packets per second per node.

Bonus: some L4 balancers use Direct Server Return — the backend replies straight to the client, skipping the LB. Response traffic is usually 10× the request, so skipping it is huge. This is how one box pushes 100+ Gbps.

What L4 is great at

Any TCP/UDP: databases, Redis, Kafka, gRPC, game servers, DNS, SMTP, WebRTC.
Throughput: millions of RPS per LB.
Long-lived connections: raw WebSocket, gRPC streams — stable for hours.
TLS pass-through: private key stays on your backend; LB never sees plaintext.

What L4 can't do

Route by URL — it doesn't know one exists.
Route by host — api.example.com and www.example.com look the same if they share an IP.
Inject headers, rewrite paths, redirect.
Cookie stickiness — best it can do is hash the source IP, which breaks behind NAT (a whole office looks like one user).

4. L7 — the content-aware proxy

L7 does what L4 refuses to do: it terminates the client's TCP connection at the LB, reads the full HTTP request, and opens a new connection to the backend.

So it's no longer a dumb relay — it's a full HTTP server on one side and a full HTTP client on the other. It can read and route on:

Method — GET, POST
Path — /api/v1/orders/123
Host — api.example.com
Query string, cookies, auth headers, User-Agent, anything

What L7 unlocks

Path routing — /api/* to one pool, /images/* to another.
Host routing — many services behind one LB, split by Host header.
Cookie stickiness — LB sets its own cookie and pins a user even across IP changes.
TLS termination — cert lives on the LB; easier rotation, central WAF.
Smart features — retries, redirects, header injection, rate limits per URL, canary deploys by cookie.
Edge auth — OIDC/Cognito login handled before the request hits your app.

What L7 costs

Latency: ~50–300 µs of extra parsing per request.
HTTP only: you can't put an L7 in front of PostgreSQL.
More connections: client → LB, LB → backend. Two sockets, two handshakes.
Price: usually billed by connections + rules + bandwidth, not just bytes.

5. Side-by-side

Dimension	L4	L7
Sees	IP + port + protocol	Method, URL, headers, cookies, body
Protocols	TCP, UDP, TLS pass-through	HTTP/1.1, HTTP/2, HTTP/3, gRPC, WebSocket
Routing	By IP / port / hash	By path, host, header, cookie, query
Sticky sessions	Source-IP hash (breaks behind NAT)	Cookie-based (reliable)
TLS	Pass-through or terminate	Terminates at LB (normal case)
Added latency	~tens of µs	~hundreds of µs
Throughput	Very high (100+ Gbps with DSR)	Lower — CPU-bound on parsing
AWS	NLB	ALB, CloudFront
Open source	HAProxy (TCP), IPVS, Maglev	Nginx, Envoy, Traefik, HAProxy (HTTP)

6. When to use which — the simple version

Use L7 when…

It's an HTTP API or website.
You want path or host routing (one domain, many services).
You want canary, blue/green, or weighted deploys.
You want WAF, redirects, or auth at the edge.
You need stickiness that survives NAT and mobile IP changes.

Use L4 when…

Your protocol isn't HTTP — database, Redis, Kafka, SMTP, DNS, game server.
You need extreme throughput or sub-millisecond p99 latency.
You need end-to-end TLS where only the backend holds the key.
You want long-lived raw connections (WebSocket, gRPC streams) with a stable pin.

Sane default: if traffic is HTTP, start with L7 (AWS ALB). Add or switch to L4 only when a specific reason forces you to.

7. When to use which — advanced scenarios

Once the basics are clear, real systems mix and match. A few patterns worth knowing:

Stack L4 in front of L7

client → NLB (static IPs, TLS pass-through) → ALB (routing, auth) → targets

Use this when:

Static IPs: a partner's firewall allowlist needs fixed IPs — ALB's change, NLB's don't.
PrivateLink: AWS PrivateLink only accepts an NLB at the front. Put ALB behind it for HTTP routing.
DDoS / IP-level filtering before smart routing kicks in.

Global → regional → service

DNS / Anycast (GSLB) → Regional L7 (CloudFront / ALB) → internal L4 (NLB) → pods

Three layers of balancing: pick a region by geo latency, pick a service by URL, pick a pod by hash. Each layer is dumber the closer you get to the metal. That's by design — one smart layer is enough.

Non-HTTP at huge scale

Databases, Redis Cluster, Kafka brokers: L4 only. An HTTP balancer can't parse these protocols. If you need smart routing on top of them (e.g., read/write splitting for Postgres), put a protocol-aware proxy like PgBouncer or ProxySQL at the app — not the network LB.

gRPC

gRPC uses HTTP/2 with long-lived multiplexed streams. Two options:

L7 with HTTP/2 support (ALB, Envoy) — balances individual RPCs across backends. Best for stateless services.
L4 pass-through (NLB) — one stream pins to one backend. Simpler, but one hot client = one hot backend.

WebSocket / real-time

Long-lived. Prefer L7 with cookie stickiness so a mobile user flipping between Wi-Fi and 4G lands on the same backend. L4 with IP hash fails here (the IP changes mid-session).

Zero-trust / mTLS everywhere

Service mesh patterns (Istio, Linkerd) run an L7 sidecar next to every pod. The outer LB is usually L4 — let the mesh do the smart stuff. Putting ALB and an L7 mesh in series just doubles the parsing.

8. The AWS map

Short version: ALB = L7, NLB = L4, CloudFront = L7 at the edge, API Gateway = L7 with API features. Pick by protocol and feature, not price alone.

AWS product	Layer	Best for	Skip it when
ALB (Application LB)	L7	HTTP/HTTPS, gRPC, WebSocket. Path/host routing, canary, Cognito auth, WAF integration.	Traffic isn't HTTP. You need static IPs. You need PrivateLink.
NLB (Network LB)	L4	TCP/UDP. Ultra-high throughput. Static Elastic IPs. PrivateLink provider. TLS pass-through.	You want URL/host routing, cookie stickiness, or WAF.
GWLB (Gateway LB)	L3/L4	Inline security appliances — firewalls, IDS/IPS — via GENEVE.	You aren't routing through a security appliance fleet.
CloudFront	L7 (edge)	Global HTTP cache + TLS + WAF close to users. Often sits in front of ALB.	Traffic is internal-only or non-HTTP.
API Gateway	L7	REST/HTTP APIs with auth, throttling, usage plans, Lambda integration.	You don't need per-API features — ALB is cheaper for plain HTTP.
Route 53	DNS	Geo / latency / weighted routing across regions. The "pick a region" layer.	You only run in one region.
Global Accelerator	L4 (anycast)	Static anycast IPs routing to the nearest healthy region. TCP/UDP.	Pure web traffic — CloudFront usually wins.
CLB (Classic LB)	L4 + basic L7	Legacy only.	Always — use ALB or NLB instead.

Common AWS shapes

Plain web app: Route 53 → CloudFront → ALB → ECS/EKS/EC2.
Public API: Route 53 → API Gateway → Lambda or ALB → service.
Postgres / Redis cluster inside VPC: NLB → instances (no ALB — not HTTP).
Multi-region with static IPs: Global Accelerator → regional ALBs.
SaaS exposed via PrivateLink: customer VPC → NLB → ALB → service.

Cost note. ALB bills on LCUs (connections + new conns + bandwidth + rules). NLB bills on NLCUs — cheaper per byte but you lose L7 features. Below ~1k RPS the difference is rarely worth re-architecting.

9. Production gotchas

Client IP preservation

When L7 terminates TCP, the backend sees the LB's IP, not the user's. Fix: trust X-Forwarded-For. In Express: app.set('trust proxy', true).

For NLB, pick one: Proxy Protocol v2 (a tiny header at connection start — your app must parse it) or IP target mode (NLB preserves the source IP transparently). Mismatched config = crashes on the first byte.

Idle timeouts

ALB: 60s default (tunable up to ~4000s)
NLB: 350s (not configurable for TCP)
Nginx: 75s default (proxy_read_timeout)

Your heartbeat must be shorter than the smallest timeout on the path, or the LB silently kills the connection.

Sticky sessions + WebSocket

Real bug: a chat app used NLB with IP-hash stickiness. A corporate NAT with 4,000 users hashed to one backend and overloaded it; mobile users who switched networks mid-call got re-pinned to a backend that had no session for them. Fix: move to ALB with cookie stickiness. Gone in a day.

Health checks

L4 health = "does TCP accept on port X?" Weak — the app can be returning 500 on every request while TCP still works. L7 health = "does GET /healthz return 200?" Much better. Even behind NLB, expose /healthz.

10. Rough performance numbers

Metric	L4 (NLB, IPVS)	L7 (ALB, Nginx)
Added latency (p50)	~50–200 µs	~200–500 µs
Throughput	Millions of RPS, 100+ Gbps with DSR	~50–300k RPS per node
CPU per request	Near-zero	Moderate (parse + rules + rewrite)

For most apps these numbers don't matter — your app is the bottleneck, not the LB. They start to matter above ~500k RPS or when chasing single-digit-millisecond p99.

Closing thought

L4 is a valet who parks your car without reading the registration. Fast. Works for any car.
L7 is a concierge who reads the name tag and decides which floor you belong on. Slower. Only works if you speak the language they expect.

Pick by the protocol and the routing you actually need. When someone asks "should we put a load balancer there?" the real question is almost always: L4 or L7?