Caching Strategy and the Thundering Herd Problem

Caching is critical for URL shorteners given their extreme read dominance and the need for single digit millisecond redirect latencies. Production systems use in memory caches (Redis, Memcached, or in process LRU caches) with read through semantics: on a token lookup, check the cache first; on a hit, return immediately; on a miss, read from the persistent store, populate the cache, and return the result. Using the 80/20 principle, approximately 20% of URLs generate 80% of traffic, so caching this hot set yields hit ratios often exceeding 95%. For a system handling 8,000 redirects per second with 20% of daily unique lookups cached, you might need around 70 GB of RAM across your cache tier.

The primary failure mode is the thundering herd or cache stampede, which occurs when a viral link experiences massive traffic and its cache entry expires or is evicted. Suddenly, thousands of concurrent requests miss the cache simultaneously and all query the backing database, overwhelming it and causing cascading failures. Mitigation strategies include request coalescing (also called single flight), where the first miss for a token triggers a database read and subsequent requests for the same token wait for that result rather than issuing duplicate queries. Adding jitter to TTLs (randomizing expiration times within a range) prevents synchronized expirations across many keys. Negative caching (caching not found results with a short TTL, perhaps 10 to 60 seconds) reduces repeated database hits for invalid tokens from scanner bots or typos.

Write through caching ensures that newly created or updated short URLs are immediately available in the cache, avoiding a cold start period where the first few redirects would miss. However, write through adds latency to the write path because you must update both the database and cache synchronously. Many systems use cache aside (lazy loading) on writes and rely on TTLs or explicit invalidation via pub sub (Redis pub sub, Kafka topics) to propagate changes. Choosing between 302 (temporary redirect) and 301 (permanent redirect) also impacts caching: 301 allows clients and intermediate proxies to cache the redirect indefinitely, reducing load on your service but making analytics less accurate and preventing future destination changes from taking effect for cached clients. Most production systems prefer 302 for operational flexibility and analytics fidelity, accepting the higher request volume.

💡 Key Takeaways

•Systems with 95% cache hit ratios and 8,000 redirects per second need approximately 70 GB of RAM to cache 20% of daily unique lookups using 80/20 traffic distribution

•Thundering herd occurs when viral links expire from cache and thousands of concurrent requests overwhelm the backing store; mitigate with request coalescing and jittered TTLs

•Negative caching with 10 to 60 second TTLs prevents repeated database queries for not found tokens from bots or typos, reducing wasted load by 30% or more in some systems

•Write through caching eliminates cold start misses for new URLs but adds synchronous latency to writes; cache aside with TTL or pub sub invalidation is often preferred

•Using 301 redirects enables indefinite client and proxy caching, drastically reducing traffic but preventing destination updates and undercounting analytics; 302 preserves control

•Request coalescing (single flight) ensures only one database query occurs per token during concurrent misses, with other requests waiting for the result to avoid duplicate work

📌 Examples

Reddit uses request coalescing in their URL shortener (redd.it): when a viral post generates 10,000 requests per second and the cache expires, only the first request queries the database while the other 9,999 wait for that result

A URL shortener experiencing a stampede without coalescing might send 5,000 concurrent database queries for the same token, spiking database CPU to 100% and causing 10 to 20 second response times

Bitly uses 302 redirects to maintain analytics accuracy and destination update capability, accepting 2 to 3 times higher request volume compared to if they used 301 and relied on client caching

← Back to URL Shortener Design Overview