Design FundamentalsScalability FundamentalsMedium⏱️ ~3 min

Little's Law and the Latency-Concurrency-Throughput Triangle

Core Formula
Little's Law: Concurrency = Throughput × Latency. When latency rises, you need proportionally more concurrent connections to maintain throughput. This explains why systems collapse under load.

THE MATH IN ACTION

API serves 10,000 RPS with 50ms latency. Concurrency = 10,000 × 0.05 = 500. You need 500 concurrent connections to sustain this throughput. Thread pool, connection pool, and memory must handle 500 simultaneous operations.

THE DEATH SPIRAL

Database slows down. Latency jumps 50ms to 200ms. To maintain 10,000 RPS, concurrency must rise to 2,000. Thread pool caps at 1,000. Requests queue. Queue time adds latency, pushing to 500ms. Now need 5,000 slots. More queuing. System spirals into failure.

Key Insight: Capacity problems manifest as latency first. Rising latency is the early warning. By the time requests fail, you are deep in the spiral.

PLANNING WITH THE FORMULA

Target: 20,000 RPS at 100ms p95. Required concurrency = 20,000 × 0.1 = 2,000. Add 30% headroom: provision for 2,600 concurrent connections. Size pools accordingly.

Each request uses 50KB memory? 2,600 concurrent requests need 130MB just for request data, plus overhead.

SERVICE TIME CEILING

Each request needs 10ms CPU on 8 cores? Max is 800 RPS (8 × 100). More threads do not help when CPU bound. Solutions: optimize code or add servers.

Rule of Thumb: Keep p99 under 5× your p50. If p50 is 40ms, p99 should stay below 200ms. Exceeding signals tail latency problems.
💡 Key Takeaways
Little's Law (Concurrency = Throughput × Latency) means doubling latency from 50ms to 100ms doubles required concurrent capacity from 500 to 1,000 requests to maintain the same throughput
Tail latency amplifies across microservice chains: five services each with p99 = 200ms results in end to end p99 approaching 1 second because slow dependencies compound
Provision 20 to 40% headroom above peak traffic so 8,000 requests per second peak becomes 10,000 to 12,000 requests per second capacity to absorb spikes and failover scenarios
Target p99 under 5× p50 latency; if p50 = 40ms then p99 should stay below 200ms, otherwise tail latency problems from garbage collection or slow queries are degrading user experience
CPU bound services hit hard throughput ceilings: 8 cores with 10ms service time caps at 8,000 requests per second (800 requests per second per core), adding threads won't help, only optimize code or add servers
📌 Interview Tips
1Use the Littles Law formula to calculate connection pool sizes: "At 10K RPS with 50ms latency, we need 500 concurrent connections." This shows quantitative thinking.
2When discussing capacity, mention the death spiral: rising latency increases required concurrency, which causes more queueing, which increases latency further.
3If asked about thread pool sizing, derive it from expected load: target RPS times p99 latency, plus 30% headroom. Concrete numbers impress interviewers.
← Back to Scalability Fundamentals Overview
Little's Law and the Latency-Concurrency-Throughput Triangle | Scalability Fundamentals - System Overflow