Per User Rate Limiting: Identity Based Quota Enforcement

Definition
Per User Rate Limiting tracks API usage against authenticated user identities, enforcing individual quotas. Each user has their own counter and limit, ensuring one heavy user cannot impact others regardless of how many requests they send.
Why Identity Matters
Per user limiting is the most common and fairest approach for authenticated APIs. Each user gets their allocation (say 1,000 requests/hour) regardless of which IP they use or how many devices they have. A user on mobile, laptop, and tablet all share the same user budget. This is what users expect when paying for an API plan.
Key Design: User ID Extraction
The rate limiting key is the user identifier from the authentication layer. Common patterns: JWT sub claim, API key lookup, session user ID. This must happen AFTER authentication, so per user limits only apply to authenticated endpoints. Key format: ratelimit:user:{user_id}:{window}. Use consistent hashing if sharding across multiple Redis instances.
Tiered Limits by Plan
Different users get different limits based on their subscription tier. Free tier: 100/hour. Pro tier: 10,000/hour. Enterprise: 100,000/hour. Store the user limit in a cache (lookup user tier, get limit) or embed in the token (JWT claim with rate limit). Changing a user plan should take effect immediately; use cache invalidation or short cache TTLs.
Advantages and Limitations
Advantages: fair allocation, directly tied to billing, users understand their quota, enables plan based differentiation. Limitations: requires authentication (cannot protect login/signup endpoints), account sharing bypasses limits (10 people using one account), compromised credentials get full quota for abuse. Per user limits are necessary but not sufficient for complete protection.

💡 Key Takeaways

✓GitHub enforces 5,000 requests per hour for authenticated users versus only 60 per hour for unauthenticated IP addresses, demonstrating how identity unlocks higher quotas

✓Weighted token consumption allows charging expensive operations more: a write API call might consume 10 tokens while a read costs 1 token, aligning limits with actual backend cost

✓Sybil attacks remain a vulnerability where attackers create many free accounts to multiply their effective quota; mitigate with email verification, payment requirements, or risk scoring

✓State storage scales with active user count: at 100,000 concurrent users with hourly windows, expect roughly 100,000 Redis keys consuming several megabytes of memory

✓Cache failures force a choice between fail open (allow traffic, risk overload) and fail closed (deny traffic, create outage); most production systems fail open with local in memory fallback buckets

✓Per user limits align naturally with billing tiers and usage plans, enabling differentiated service levels from free accounts at 1,000 requests per day to enterprise at millions per day

📌 Interview Tips

1Stripe implements per account and per endpoint limits that vary by customer profile and endpoint sensitivity, with payment creation endpoints having stricter limits than balance lookup endpoints

2AWS API Gateway usage plans let you define per API key quotas (daily, weekly, monthly) plus throttle limits (requests per second and burst size) that stack on top of regional account level caps

← Back to Rate Limit Strategies (Per-User, Per-IP, Global) Overview