Design Fundamentals • Scalability FundamentalsHard⏱️ ~3 min
Scalability Failure Modes and Edge Cases
Hotspots and skew occur when a celebrity account or viral key concentrates more than 10 percent of total traffic onto a single shard. Hash based partitioning mitigates this by spreading keys uniformly, but naive schemes like user_id modulo N or range partitions can still hotspot. Mitigations include key salting (appending random suffix to distribute a hot key across shards), adaptive repartitioning at runtime when imbalance is detected, and load aware routing that sends requests to less busy replicas.
Cache stampedes happen when a popular item expires and thousands of concurrent requests miss the cache simultaneously, overwhelming the origin with a request spike that can be 100x normal load. Use request coalescing (also called single flight) so only one request fetches from origin while others wait for the result, soft Time To Live (TTL) values where background refresh happens before expiration, and jittered expiration times to avoid synchronized invalidation across cache nodes. Retry storms amplify this problem: if p99 latency rises from 100 milliseconds to 1 second at 10,000 RPS, a 1 second burst creates a 10,000 request backlog. If clients retry at the 500 millisecond timeout mark, effective load can double, causing complete collapse.
Replication lag under heavy write load can cause replicas to fall seconds behind the primary. Users see read after write anomalies where their own just posted content is missing because they read from a lagging replica. Provide read your writes consistency via session stickiness to the primary for a short window after writes, or use client side version tokens that reject stale reads. Rebalancing shocks occur when adding or removing nodes: with consistent hashing, approximately one divided by N of keys move per added node, spiking cache misses and backend load during the migration window. Throttle data movement rates and pre warm caches to prevent origin overload.
💡 Key Takeaways
•Hotspots can concentrate over 10% of traffic on one shard; a celebrity account with 100 million followers creates extreme write amplification on fanout. Mitigation requires key salting, adaptive repartitioning, or switching to pull based fanout for mega users
•Cache stampedes occur when popular keys expire simultaneously, causing request spikes 100x normal; single flight coalescing and soft TTL with background refresh before expiration prevent origin overload
•Retry storms double effective load when clients retry on timeout; if p99 grows from 100 ms to 1 sec at 10k RPS, retries at 500 ms create 20k RPS. Use bounded retries with exponential backoff, jitter, and per client budgets to prevent collapse
•Replication lag causes read after write anomalies where users cannot see their own updates; maintain session stickiness to primary for a short window post write, or use version tokens to detect and reject stale reads from replicas
📌 Examples
Twitter handles celebrity tweets (which can trigger 100k+ retweets per second spikes) using hybrid fanout: mega users bypass fanout on write entirely and use pull based timelines to avoid overwhelming the write path
Cloudflare edge network with over 100 Tbps capacity absorbs Distributed Denial of Service (DDoS) attacks in the tens of terabits per second range by distributing traffic across 200+ cities and applying rate limiting and anomaly detection at the edge