What Are Sticky Sessions and How Do They Work?

Definition
Sticky sessions (also called session affinity) is a load balancing pattern where all requests from a specific client are routed to the same backend server for the duration of their session. Instead of spreading requests across all available servers, the load balancer creates a binding that pins each user to one particular instance.
How Sticky Sessions Work
When a client first connects, the load balancer selects a backend server and creates an affinity marker. This marker can be: a cookie inserted by the load balancer containing an encoded server identifier, a hash of the client IP address, or an entry in the load balancer internal mapping table. On subsequent requests, the load balancer reads this marker and routes traffic back to the same server. The binding persists until the session expires (TTL typically 10-30 minutes), the server becomes unhealthy, or the marker is explicitly cleared.
Performance Benefits
The primary motivation is performance and simplicity. Applications can store user-specific state (shopping cart items, profile caches, partially completed workflows) directly in server memory or local cache. This eliminates the need for an external session store lookup on every request, saving approximately 0.5-2ms at the 50th percentile within the same availability zone, and 3-8ms at the 99th percentile across zones. For services where handler time is 2-5ms, this represents a 10-50% reduction in median latency.
The Fundamental Trade-off
Sticky sessions convert a stateless architecture into a stateful one. This has consequences: if the server holding a user session crashes, that session data is lost unless replicated. Load becomes unevenly distributed because active sessions accumulate on certain servers. Operational tasks like deployments and scaling become more complex since you must account for existing session bindings. The question is whether the 0.5-8ms latency savings justify these operational costs.
Load Imbalance Reality
In production, load imbalance ratios commonly exceed 1.5-2.5x the mean. Some users make 100 requests per session, others make 2. Long-running sessions accumulate on servers that happened to receive them. Power users or automated clients can create hotspots where individual instances run at 80-90% CPU while the cluster average shows 40%, giving false confidence in available headroom.
Key Trade-off: Sticky sessions save 0.5-8ms per request by avoiding external session store lookups, but create stateful architecture with uneven load (1.5-2.5x imbalance), session loss on failure, and complex deployments. The right choice depends on whether latency savings justify operational complexity.

💡 Key Takeaways

✓Load balancer pins each client to specific backend using cookies, IP hashing, or internal mappings; TTL typically 10-30 minutes

✓Saves 0.5-8ms per request by avoiding external session store lookups; 10-50% latency reduction for 2-5ms handlers

✓Converts stateless to stateful architecture: session loss on server failure, uneven load distribution, complex deployments

✓Load imbalance ratios of 1.5-2.5x are common; instances can hit 80-90% CPU while cluster average shows 40%

📌 Interview Tips

1Explain the latency benefit: eliminating external session store lookup saves 0.5-2ms p50 same-zone, 3-8ms p99 cross-zone

2Describe imbalance reality: power users with 100 requests/session vs casual users with 2 creates hotspots

3Present the trade-off question: does 0.5-8ms latency savings justify stateful complexity and operational overhead?

← Back to Sticky Sessions Overview