URL Shortener Core Architecture and Read Path Optimization

A URL shortener is fundamentally a key value mapping system that translates compact tokens (typically 7 Base62 characters yielding 62^7 ≈ 3.5 trillion possible combinations) to full destination URLs. The architecture prioritizes the read path because production traffic exhibits extreme read dominance, often 200:1 or higher read to write ratios. At 100 million new URLs per month (approximately 40 writes per second sustained), this translates to roughly 8,000 redirects per second sustained, with significantly higher peaks during diurnal traffic patterns.

The redirect path follows a cache optimized design: DNS or Anycast routes users to the nearest edge location, a load balancer forwards requests to stateless redirect servers, which first check an in memory cache. On a cache hit (the common case), the server immediately returns a 302 or 301 redirect, achieving single digit millisecond P50 latencies within a region. On a cache miss, the system reads from the persistent store (adding perhaps 5 to 15 milliseconds if served from in memory indexes), populates the cache, and returns the redirect. The P99 latency target for production systems is typically sub 50 milliseconds, which means all heavy operations like analytics aggregation, safety scanning, and abuse checks must be decoupled and handled asynchronously.

The write path (shortening a URL) performs validation and canonicalization of the input URL, generates a unique token using one of several strategies, persists the mapping to durable storage, optionally primes the cache, and enqueues analytics events. Companies like Google (g.co), Twitter (t.co), and Amazon (a.co) operate their own short domains at massive scale using this pattern, with read through caches achieving very high hit ratios for viral or hot links. Storage requirements are modest: even storing 1.8 billion URLs over 5 years at 2 KB per mapping (including full URL and metadata) requires only about 3.6 TB, while a leaner schema averaging 500 bytes per mapping would need approximately 60 TB for 120 billion URLs over 100 years.

💡 Key Takeaways

•Read to write ratios in production commonly exceed 200:1, driving architectures that optimize cache hit rates and remove stateful logic from the synchronous redirect path

•Target latencies: cache hit P50 in single digit milliseconds, P99 under 50 ms within region; cache misses add 5 to 15 ms for persistent store reads with in memory indexes

•A 7 character Base62 token provides 3.5 trillion possible URLs; at 100 million URLs per month, this keyspace lasts approximately 2,900 years before exhaustion

•Storage scales modestly: 1.8 billion URLs at 2 KB each requires 3.6 TB over 5 years, while 500 byte mappings support 120 billion URLs in 60 TB over 100 years

•Analytics, safety checks, and abuse detection must be asynchronous to avoid adding latency to the redirect path; emit events for out of band processing

•Stateless redirect servers enable horizontal scaling and eliminate coordination overhead during the hot read path

📌 Examples

Google's g.co handles redirects by routing users via Anycast to the nearest data center, checking a distributed in memory cache (likely Memcached or similar), and returning 302 redirects in under 10 ms for cached entries

Twitter's t.co processes billions of redirects with cache hit ratios exceeding 95% for popular tweets, using 80/20 traffic distribution where caching just 20% of daily unique lookups requires approximately 70 GB of RAM

Amazon's a.co uses disjoint ID range allocation per region to avoid cross region coordination on writes, with asynchronous replication for reads across all regions

← Back to URL Shortener Design Overview