Resource Budgeting and Failure Modes at Scale

Resource Budgeting
Every request has a resource budget: maximum CPU time, memory, connections, and wall clock time. When budget exhausts, the request terminates or degrades. Without budgets, runaway requests consume unbounded resources, affecting other requests.
Implement budgets at multiple levels. Per request: timeout after 30 seconds, cap memory at 100MB. Per user: rate limit to 100 requests per minute. Per operation type: allow 1000 concurrent reads but only 100 concurrent writes. Nested budgets provide defense in depth.
Memory Pressure Under Concurrency
High concurrency amplifies memory usage. Each concurrent request needs buffers, intermediate results, and stack space. 10,000 concurrent requests with 1MB each need 10GB. Memory exhaustion causes garbage collection pauses, swapping, or out of memory kills.
Bound concurrent requests based on memory budget, not just CPU. If each request needs 10MB and you have 10GB available, cap at 1000 concurrent requests regardless of thread availability. Use admission control: queue or reject requests when at capacity rather than accepting and failing later.
Cascading Failure Modes
Retry storms: When a service slows, clients retry. Each retry adds load. The service slows more. More retries. Exponential backoff with jitter prevents synchronized retries. Cap retry count. Use circuit breakers to stop retrying known bad paths.
Queue buildup: Slow processing causes queues to grow. Old requests in queue become stale. When finally processed, clients have given up. You processed work nobody wanted. Set queue TTLs. Drop requests older than timeout rather than processing uselessly.
Connection exhaustion: Each waiting request holds a connection. Connection pools exhaust. New requests cannot get connections. Even fast paths block. Size pools for peak concurrent requests plus buffer. Monitor pool wait times as early warning.
🎯 When To Use: Implement resource budgeting early in system design. Retrofitting limits into an overloaded system is painful. Start with conservative limits and tune based on production data.

💡 Key Takeaways

✓Resource budgets cap CPU time, memory, connections, and wall clock time per request

✓Bound concurrency by memory, not just CPU: 10K requests at 1MB each needs 10GB

✓Retry storms cascade when slow services trigger retries that add more load

✓Queue TTLs prevent processing stale requests that clients have abandoned

✓Implement budgets early; retrofitting limits into overloaded systems is painful

📌 Interview Tips

1When discussing system limits, explain defense in depth: per request timeouts plus per user rate limits plus per operation type concurrency caps

2Describe retry storm prevention: exponential backoff with jitter prevents synchronized retries, circuit breakers stop retrying known failures

3If asked about queue design, mention TTLs: drop requests older than client timeout rather than wasting resources processing abandoned work

← Back to Concurrency vs Parallelism Overview