What Are Sticky Sessions and How Do They Work?
How Sticky Sessions Work
When a client first connects, the load balancer selects a backend server and creates an affinity marker. This marker can be: a cookie inserted by the load balancer containing an encoded server identifier, a hash of the client IP address, or an entry in the load balancer internal mapping table. On subsequent requests, the load balancer reads this marker and routes traffic back to the same server. The binding persists until the session expires (TTL typically 10-30 minutes), the server becomes unhealthy, or the marker is explicitly cleared.
Performance Benefits
The primary motivation is performance and simplicity. Applications can store user-specific state (shopping cart items, profile caches, partially completed workflows) directly in server memory or local cache. This eliminates the need for an external session store lookup on every request, saving approximately 0.5-2ms at the 50th percentile within the same availability zone, and 3-8ms at the 99th percentile across zones. For services where handler time is 2-5ms, this represents a 10-50% reduction in median latency.
The Fundamental Trade-off
Sticky sessions convert a stateless architecture into a stateful one. This has consequences: if the server holding a user session crashes, that session data is lost unless replicated. Load becomes unevenly distributed because active sessions accumulate on certain servers. Operational tasks like deployments and scaling become more complex since you must account for existing session bindings. The question is whether the 0.5-8ms latency savings justify these operational costs.
Load Imbalance Reality
In production, load imbalance ratios commonly exceed 1.5-2.5x the mean. Some users make 100 requests per session, others make 2. Long-running sessions accumulate on servers that happened to receive them. Power users or automated clients can create hotspots where individual instances run at 80-90% CPU while the cluster average shows 40%, giving false confidence in available headroom.