Loading...
Design FundamentalsScalability FundamentalsMedium⏱️ ~3 min

The Three Scaling Planes: Compute, Data, and Delivery

Decomposing Horizontal Scale: Horizontal scaling isn't one problem; it's three independent axes. The compute plane handles request processing, the data plane manages state and persistence, and the delivery plane moves bytes to users. Each scales differently with distinct bottlenecks and patterns. Compute Plane: Stateless and Elastic Application servers should be completely stateless. Session data, user preferences, shopping carts live in external stores like Redis or databases, never in server memory. This lets you add or remove servers in seconds without coordination. A load balancer distributes incoming requests across the pool using round robin, least connections, or header based routing. Reddit runs stateless application servers behind Layer 7 (L7) load balancers. During a traffic spike, they launch new instances that immediately start serving traffic. No warm up period, no data migration, no complicated orchestration. When traffic drops, they terminate instances. This elasticity keeps infrastructure costs proportional to load. Data Plane: Replication and Partitioning Databases don't scale like application servers. You need two techniques: replication for read scale and availability, partitioning (sharding) for write scale. Replication creates copies. A primary handles writes, replicas handle reads. One primary with four replicas can serve 5× read traffic. But all writes still bottleneck on the primary, and replicas introduce replication lag. Under heavy write load, replicas may lag seconds behind, causing read after write anomalies where users don't see their own updates.
✓ In Practice: Meta's architecture uses massive cache tiers (hundreds of terabytes per region) in front of databases, serving millions of queries per second (QPS) with sub-millisecond latency. Cache hit ratios above 95% mean database read load drops by 20× while write load remains manageable on primaries.
Partitioning splits data across shards. User IDs 0 to 1 million live on shard 1, 1 to 2 million on shard 2, and so on. Each shard has its own primary and replicas. Now writes scale linearly with shard count. The challenge becomes choosing good partition keys that spread load evenly without creating hotspots. Hash based partitioning (hash of user ID) distributes evenly but makes range queries expensive. Range based partitioning (user ID 0 to 1M) enables efficient ranges but risks hotspots if new users cluster in high ID ranges. Delivery Plane: Pushing Bytes Closer The delivery plane moves content to users with minimal latency and cost. Content Delivery Networks (CDNs) cache static assets (images, videos, JavaScript, Cascading Style Sheets) at edge locations. Reddit achieves greater than 90% hit ratios for static assets, keeping origin requests per second (RPS) and egress bandwidth costs manageable during viral traffic. For dynamic content, use compression (gzip, brotli), lazy loading, and asynchronous processing. Don't block user responses on expensive operations like sending emails, updating search indexes, or computing recommendations. Push those tasks to background queues and return immediately.
💡 Key Takeaways
Compute plane scales through stateless servers with externalized session state (Redis, tokens) enabling instant add/remove of instances without data migration or coordination overhead
Data plane uses replication for read scaling (1 primary with 4 replicas serves 5× read traffic) but writes bottleneck on primary; sharding splits writes across multiple primaries for linear write scaling
Replication lag under heavy writes can reach seconds, causing read after write anomalies where users don't see their own updates immediately after posting
Delivery plane with Content Delivery Network (CDN) edge caching achieves greater than 90% hit ratios for static assets, reducing origin requests per second and bandwidth costs by 10× during traffic spikes
Partition key choice is critical: hash based keys (hash of user ID) distribute evenly but break range queries, while range based keys (user ID 0 to 1M per shard) enable ranges but risk hotspots
📌 Examples
Reddit stateless application servers behind Layer 7 load balancers scale from 50 to 200 instances during viral posts in minutes without session migration or warm up periods
Twitter timeline service uses hybrid fanout: push to online followers via write time fanout (compute plane) while pulling cold users at read time, smoothing write spikes through message queues to keep read path latency at 100 to 200ms
Netflix Open Connect appliances at Internet Service Provider (ISP) locations serve 90%+ of video bytes from edge, delivering tens to hundreds of gigabits per second per server with startup times under 2 seconds
← Back to Scalability Fundamentals Overview
Loading...
The Three Scaling Planes: Compute, Data, and Delivery | Scalability Fundamentals - System Overflow