Resource Budgeting and Failure Modes at Scale
Resource Budgeting
Every request has a resource budget: maximum CPU time, memory, connections, and wall clock time. When budget exhausts, the request terminates or degrades. Without budgets, runaway requests consume unbounded resources, affecting other requests.
Implement budgets at multiple levels. Per request: timeout after 30 seconds, cap memory at 100MB. Per user: rate limit to 100 requests per minute. Per operation type: allow 1000 concurrent reads but only 100 concurrent writes. Nested budgets provide defense in depth.
Memory Pressure Under Concurrency
High concurrency amplifies memory usage. Each concurrent request needs buffers, intermediate results, and stack space. 10,000 concurrent requests with 1MB each need 10GB. Memory exhaustion causes garbage collection pauses, swapping, or out of memory kills.
Bound concurrent requests based on memory budget, not just CPU. If each request needs 10MB and you have 10GB available, cap at 1000 concurrent requests regardless of thread availability. Use admission control: queue or reject requests when at capacity rather than accepting and failing later.
Cascading Failure Modes
Retry storms: When a service slows, clients retry. Each retry adds load. The service slows more. More retries. Exponential backoff with jitter prevents synchronized retries. Cap retry count. Use circuit breakers to stop retrying known bad paths.
Queue buildup: Slow processing causes queues to grow. Old requests in queue become stale. When finally processed, clients have given up. You processed work nobody wanted. Set queue TTLs. Drop requests older than timeout rather than processing uselessly.
Connection exhaustion: Each waiting request holds a connection. Connection pools exhaust. New requests cannot get connections. Even fast paths block. Size pools for peak concurrent requests plus buffer. Monitor pool wait times as early warning.