Failure Modes: Collisions, Hot Keys, and Cache Stampedes

URL shorteners face several critical failure modes that can degrade availability and correctness if not handled properly. Key collision and race conditions are the first major concern: random or hash based ID generation schemes have non zero collision probability, and without atomic uniqueness checks, you can create duplicate mappings or lose writes. For example, two concurrent requests to shorten the same long URL might both generate the same random token, and if both perform a read check, find it absent, and then insert, you create duplicate entries or one write silently fails. The fix is to use atomic operations such as insert if not exists (SQL), conditional writes (DynamoDB condition expressions), or compare and swap primitives, combined with retry logic on conflict.

Hot key and cache stampede failures occur when a viral link generates extreme skew in traffic. A single token might receive 100,000 requests per second, overwhelming a single cache shard or causing evictions of other keys. If the hot key expires or is evicted from cache, all 100,000 requests per second simultaneously miss and hammer the backing database, which can only serve perhaps 10,000 queries per second, causing timeouts, cascading failures, and service outages. Real world examples include major product launches, breaking news links, or viral social media posts. Mitigation includes request coalescing (only one backend query per token despite thousands of concurrent requests), jittered TTLs to prevent synchronized expirations, increasing cache size or using dedicated cache instances for known hot keys, and rate limiting at the token level to shed excess load gracefully.

Key Generation Service (KGS) depletion represents another failure mode: if the pre generated token pool runs dry or the KGS itself becomes unavailable, the service cannot issue new short URLs until the pool refills or the KGS recovers. This can halt business critical workflows such as marketing campaigns or transactional emails. Solutions include maintaining multiple independent KGS instances across regions, monitoring pool headroom and alerting when reserves drop below several hours of capacity, and implementing backpressure mechanisms to slow writes before complete exhaustion. Finally, abuse and security issues such as predictable keys enabling enumeration, spam and phishing links harming domain reputation, and redirect loops (short URL pointing to another short URL) require asynchronous safety scanning, loop detection with max hop limits, and rate limiting per IP, ASN, and account to protect system availability.

💡 Key Takeaways

•Random ID generation without atomic uniqueness checks can create duplicate mappings during concurrent writes; use insert if not exists, conditional writes, or compare and swap with retry on conflict

•Hot keys (viral links) can generate 100,000+ requests per second to a single token, overwhelming cache shards and causing evictions; if expired, stampedes can send 100,000 DB queries per second exceeding capacity

•Request coalescing limits backend queries to one per token despite thousands of concurrent requests, preventing database overload during cache misses on hot keys

•Key Generation Service pool depletion halts new URL creation; maintain multi region KGS instances, monitor headroom, and alert when reserves drop below several hours at current consumption rate

•Predictable sequential keys enable enumeration attacks where attackers scrape all short URLs; mitigate with salting, token mixing, or reserving/filtering certain patterns

•Redirect loops (short URL points to another short URL) require loop detection and max hop enforcement (e.g., limit to 3 hops) to prevent infinite redirect chains and client errors

📌 Examples

In 2019, a major retailer's Black Friday campaign generated 200,000 requests per second to a single short URL; their cache expired during the spike, sending 200,000 DB queries per second and causing a 15 minute outage

A URL shortener without atomic insert checks experienced duplicate token creation when two API servers both generated token 'aBc1234', checked it was absent, and inserted, resulting in one long URL silently overwriting another

Bitly monitors their Key Generation Service pool size and alerts when remaining tokens drop below 6 hours of capacity at current consumption rate, triggering automated pool refills to prevent service disruption

← Back to URL Shortener Design Overview