Capacity Sizing and Latency Budgeting Across System Tiers
Server Capacity Benchmarks
A single application server handles 1,000-10,000 RPS depending on request complexity. Simple API returning cached data: 10,000 RPS. Complex business logic with database queries: 1,000 RPS. CPU intensive processing: 100-500 RPS.
Web servers (serving static files): 50,000-100,000 RPS. Load balancers: 100,000-1,000,000 connections. These establish your baseline: if you need 50,000 RPS for an API, plan for 5 to 50 application servers depending on complexity.
Database Capacity
Single SQL database: 10,000-50,000 simple reads/sec, 1,000-5,000 writes/sec. Complex queries with joins drop read capacity to 1,000-5,000/sec. Writes are bottlenecked by disk durability requirements.
Read replicas multiply read capacity linearly: 3 replicas = 3x read throughput. Write capacity does not scale this way; writes must go to primary and replicate to followers. This is why read-heavy workloads scale easier than write-heavy ones.
Cache Capacity
Redis single instance: 100,000+ ops/sec. Memcached: similar throughput. A single cache server often handles more load than 10 database servers, which is why caching is so powerful.
Memory sizing: 1M items × 1KB = 1GB. Add overhead (Redis uses ~2x raw data size for structures) = 2GB. For 100 million items: 200GB, likely needs clustering across multiple nodes.
Latency Budget Allocation
Total user-facing latency target: typically 100-300ms. Break down across tiers: CDN/edge (5-20ms), load balancer (1-5ms), application server (10-50ms), cache lookup (1-5ms), database query (10-50ms), response serialization (5-10ms).
Network round trips add up. Same datacenter: 0.5ms per hop. Cross-region: 50ms per hop. A request hitting 3 services adds 1.5ms in datacenter but 150ms cross-region. Minimize hops, colocate services that communicate frequently.