Rate LimitingRate Limit Strategies (Per-User, Per-IP, Global)Medium⏱️ ~3 min

Per User Rate Limiting: Identity Based Quota Enforcement

Definition
Per User Rate Limiting tracks API usage against authenticated user identities, enforcing individual quotas. Each user has their own counter and limit, ensuring one heavy user cannot impact others regardless of how many requests they send.

Why Identity Matters

Per user limiting is the most common and fairest approach for authenticated APIs. Each user gets their allocation (say 1,000 requests/hour) regardless of which IP they use or how many devices they have. A user on mobile, laptop, and tablet all share the same user budget. This is what users expect when paying for an API plan.

Key Design: User ID Extraction

The rate limiting key is the user identifier from the authentication layer. Common patterns: JWT sub claim, API key lookup, session user ID. This must happen AFTER authentication, so per user limits only apply to authenticated endpoints. Key format: ratelimit:user:{user_id}:{window}. Use consistent hashing if sharding across multiple Redis instances.

Tiered Limits by Plan

Different users get different limits based on their subscription tier. Free tier: 100/hour. Pro tier: 10,000/hour. Enterprise: 100,000/hour. Store the user limit in a cache (lookup user tier, get limit) or embed in the token (JWT claim with rate limit). Changing a user plan should take effect immediately; use cache invalidation or short cache TTLs.

Advantages and Limitations

Advantages: fair allocation, directly tied to billing, users understand their quota, enables plan based differentiation. Limitations: requires authentication (cannot protect login/signup endpoints), account sharing bypasses limits (10 people using one account), compromised credentials get full quota for abuse. Per user limits are necessary but not sufficient for complete protection.

💡 Key Takeaways
GitHub enforces 5,000 requests per hour for authenticated users versus only 60 per hour for unauthenticated IP addresses, demonstrating how identity unlocks higher quotas
Weighted token consumption allows charging expensive operations more: a write API call might consume 10 tokens while a read costs 1 token, aligning limits with actual backend cost
Sybil attacks remain a vulnerability where attackers create many free accounts to multiply their effective quota; mitigate with email verification, payment requirements, or risk scoring
State storage scales with active user count: at 100,000 concurrent users with hourly windows, expect roughly 100,000 Redis keys consuming several megabytes of memory
Cache failures force a choice between fail open (allow traffic, risk overload) and fail closed (deny traffic, create outage); most production systems fail open with local in memory fallback buckets
Per user limits align naturally with billing tiers and usage plans, enabling differentiated service levels from free accounts at 1,000 requests per day to enterprise at millions per day
📌 Interview Tips
1Stripe implements per account and per endpoint limits that vary by customer profile and endpoint sensitivity, with payment creation endpoints having stricter limits than balance lookup endpoints
2AWS API Gateway usage plans let you define per API key quotas (daily, weekly, monthly) plus throttle limits (requests per second and burst size) that stack on top of regional account level caps
← Back to Rate Limit Strategies (Per-User, Per-IP, Global) Overview
Per User Rate Limiting: Identity Based Quota Enforcement | Rate Limit Strategies (Per-User, Per-IP, Global) - System Overflow