Rate LimitingRate Limit Strategies (Per-User, Per-IP, Global)Medium⏱️ ~3 min

Per User Rate Limiting: Identity Based Quota Enforcement

Per user rate limiting ties request quotas to authenticated identities like user IDs, API keys, or tenant accounts. When a request arrives, you extract the principal identifier and check their token bucket or counter. If tokens remain, the request proceeds and tokens decrement; otherwise, you return 429 Too Many Requests. This approach delivers precise fairness because each tenant gets their own budget regardless of how others behave. GitHub enforces 5,000 requests per hour per authenticated user or token, while unauthenticated clients (identified by IP) get only 60 requests per hour. This dramatic difference shows how authentication unlocks higher limits. Stripe goes further with per account and per endpoint limits that adapt based on customer profile: a startup might have lower write limits than an enterprise customer processing millions of transactions monthly. The system can weight expensive operations higher, so a payment creation might consume 10 tokens while fetching a balance costs 1 token. The key tradeoff is that per user limiting only works for authenticated traffic. Anonymous endpoints remain vulnerable unless you add per IP guards. You also face sybil attacks where adversaries create thousands of free accounts to bypass limits. GitHub mitigates this by combining per user limits with per IP backstops and requiring email verification. For multi tenant platforms, per user limits align perfectly with billing tiers: free users get 1,000 requests per day, paid users get 100,000, and enterprise gets millions with dedicated capacity. Implementation requires fast identity resolution. Most systems store counters in Redis or Memcached with keys like "ratelimit:user:12345:hourly" and TTL set slightly beyond the window duration. At 100,000 active users with hourly windows, you need roughly 100,000 keys in memory. Each counter check adds 0.5 to 2 milliseconds of latency for intra region cache access. If the cache fails, you must decide: fail open (allow requests, risk overload) or fail closed (deny requests, create outage). Production systems typically fail open with local in memory fallback buckets to maintain availability.
💡 Key Takeaways
GitHub enforces 5,000 requests per hour for authenticated users versus only 60 per hour for unauthenticated IP addresses, demonstrating how identity unlocks higher quotas
Weighted token consumption allows charging expensive operations more: a write API call might consume 10 tokens while a read costs 1 token, aligning limits with actual backend cost
Sybil attacks remain a vulnerability where attackers create many free accounts to multiply their effective quota; mitigate with email verification, payment requirements, or risk scoring
State storage scales with active user count: at 100,000 concurrent users with hourly windows, expect roughly 100,000 Redis keys consuming several megabytes of memory
Cache failures force a choice between fail open (allow traffic, risk overload) and fail closed (deny traffic, create outage); most production systems fail open with local in memory fallback buckets
Per user limits align naturally with billing tiers and usage plans, enabling differentiated service levels from free accounts at 1,000 requests per day to enterprise at millions per day
📌 Examples
Stripe implements per account and per endpoint limits that vary by customer profile and endpoint sensitivity, with payment creation endpoints having stricter limits than balance lookup endpoints
AWS API Gateway usage plans let you define per API key quotas (daily, weekly, monthly) plus throttle limits (requests per second and burst size) that stack on top of regional account level caps
← Back to Rate Limit Strategies (Per-User, Per-IP, Global) Overview
Per User Rate Limiting: Identity Based Quota Enforcement | Rate Limit Strategies (Per-User, Per-IP, Global) - System Overflow