Load BalancingSticky SessionsEasy⏱️ ~2 min

What Are Sticky Sessions and How Do They Work?

Sticky sessions, also called session affinity, is a load balancing pattern where all requests from a specific client are routed to the same backend server for the duration of their session. Instead of spreading requests across all available servers, the load balancer creates a binding that pins each user to one particular instance. The mechanics are straightforward: when a client first connects, the load balancer selects a backend server and creates an affinity marker. This marker can be a cookie inserted by the load balancer (containing an encoded server identifier), a hash of the client IP address, or an entry in the load balancer's internal mapping table. On subsequent requests, the load balancer reads this marker and routes traffic back to the same server. The primary motivation is performance and simplicity. Applications can store user specific state (shopping cart items, profile caches, partially completed workflows) directly in the server's memory or local cache. This eliminates the need for an external session store lookup on every request, saving approximately 0.5 to 2 milliseconds at the 50th percentile within the same availability zone, and 3 to 8 milliseconds at the 99th percentile across zones. For services where handler time is 2 to 5 milliseconds, this represents a 10 to 50 percent reduction in median latency. The tradeoff is that you're now running a stateful architecture. If the server holding a user's session crashes, that session data is lost unless you've replicated it somewhere. Load becomes unevenly distributed because active sessions accumulate on certain servers, and operational tasks like deployments or scaling become more complex since you must account for existing session bindings.
💡 Key Takeaways
Load balancer pins each client to a specific backend server using cookies, IP hashing, or internal mappings
Saves 0.5 to 8 milliseconds per request by avoiding external session store lookups, reducing median latency by 10 to 50 percent for typical web handlers
Enables simpler application logic since user state can live in server memory without distributed coordination
Creates uneven load distribution where busy users or long sessions overload specific instances, with imbalance ratios commonly exceeding 1.5 times the mean
Instance failures drop all sessions pinned to that server, forcing users to restart flows or lose shopping cart data
📌 Examples
AWS Application Load Balancer uses a cookie named AWSALB that encodes the target backend and expiry timestamp, configurable TTL typically 20 to 30 minutes
E-commerce checkout flow keeps cart items and shipping address in server memory, eliminating 2 to 5 milliseconds of cache lookup latency on each page navigation
WebSocket chat application maintains connection state and message buffers locally for 30 to 60 minute session durations, avoiding cross node coordination for presence updates
← Back to Sticky Sessions Overview
What Are Sticky Sessions and How Do They Work? | Sticky Sessions - System Overflow