Priority-Based Load Shedding: What to Drop First
Request Priority Classification
Not all requests have equal value. A payment completion is worth more than a recommendation refresh. Classify requests into priority tiers: Critical (authentication, payments, core transactions) should never be shed unless necessary. Important (user data reads, search) can be delayed but should complete. Optional (analytics, logging, background jobs) can be shed freely. Systems typically use 3-5 priority levels with separate shedding thresholds.
Implementation Strategies
Priority headers propagate through service calls. API gateway assigns priority based on endpoint and user tier: X-Request-Priority: critical. Downstream services apply shedding rules: at 70% CPU, shed optional. At 85% CPU, shed important. Only at 95% CPU consider shedding critical.
User Tier Considerations
Paid users expect higher reliability. During overload, free tier requests shed at 60% capacity while premium users retain access until 85%. A system at 10,000 RPS might allocate 7,000 RPS to premium and 3,000 RPS to free users.
Probabilistic Shedding
Rather than hard cutoffs, probabilistic shedding rejects a percentage. At 80% capacity, reject 10% of optional requests. At 90%, reject 50%. Formula: shedding_rate = (current_load - threshold) / (max_capacity - threshold). At 95% with threshold 80%, shed (95-80)/(100-80) = 75% of eligible requests.
Fairness in Shedding
Random shedding can unfairly penalize unlucky users repeatedly. Token bucket per client ensures fair distribution: each client gets tokens, requests consume tokens, empty bucket means rejection. This prevents one client from being repeatedly shed while another always succeeds.