Loading...
Design Fundamentals • Scalability FundamentalsEasy⏱️ ~2 min
Vertical vs Horizontal Scaling: The Two Core Motions
The Fundamental Choice:
When your system needs more capacity, you face two options: make your machines bigger (vertical scaling) or add more machines (horizontal scaling). This choice shapes your entire architecture.
Vertical scaling means upgrading to a larger instance. Move from 8 cores and 32GB Random Access Memory (RAM) to 32 cores and 128GB RAM. It's elegant because your code doesn't change. No distributed coordination, no network calls between nodes, no splitting data. A single database server handling 5,000 queries per second (QPS) can jump to 15,000 QPS just by upgrading hardware.
But Physics Sets Hard Limits:
The largest commercial instances top out around 448 cores and 24TB RAM. That's your ceiling. Even worse, you've created a single point of failure. When that one massive server goes down during a hardware fault or maintenance window, your entire service disappears.
Horizontal scaling distributes the load across many smaller machines. Instead of one 32 core server, you run ten 4 core servers behind a load balancer. Each handles 1,500 QPS for a combined 15,000 QPS. When you need more capacity, add another server. When hardware fails, 90% of your capacity stays online.
⚠️ The Complexity Tax: Horizontal scaling forces you to solve distribution problems. Where does user session data live? How do you query across multiple database shards? What happens when network partitions split your cluster? These problems don't exist in vertical scaling.
The Decision Framework:
Start vertical when you're early stage or serving under 10,000 requests per second. A single well provisioned server costs less and deploys faster than building distributed systems infrastructure. Switch to horizontal when you hit three triggers: approaching hardware limits (above 50,000 QPS on a single node), needing geographic distribution for latency, or requiring high availability with automated failover.
The sweet spot for many systems is hybrid: vertically scale individual nodes to reduce cluster size (fewer nodes means less coordination overhead), then horizontally scale the cluster for capacity and resilience. Run three 16 core application servers instead of twelve 4 core servers. You get better single node performance while maintaining distributed benefits.💡 Key Takeaways
•Vertical scaling upgrades a single machine (8 to 32 cores) for simplicity but hits hard limits around 448 cores and creates a single point of failure
•Horizontal scaling adds more machines (1 to 10 servers) for unlimited growth and high availability but requires solving distribution problems like session management and data partitioning
•Cost tradeoff: Ten 4 core servers at $200 each costs $2,000 per month with redundancy versus one 32 core server at $1,000 per month but total outage on failure
•Performance ceiling: A single PostgreSQL instance tops out around 100,000 writes per second while horizontally scaled Cassandra handles millions of writes per second across a cluster
•Start vertical for simplicity under 10,000 requests per second, switch to horizontal when approaching hardware limits, needing multi region deployment, or requiring 99.99% availability with automated failover
📌 Examples
Reddit application servers: Stateless web servers behind Layer 7 (L7) load balancers scale horizontally, adding capacity during traffic spikes by launching new instances in minutes
Early stage startup with 2,000 daily active users: Run on a single 8 core database server for $400 per month rather than building sharded infrastructure that costs $3,000 per month in complexity and operational overhead
Netflix control plane: Authentication and personalization APIs run on horizontally scaled regional clusters with automatic failover, while individual nodes are vertically scaled to 16-32 cores to reduce cluster coordination overhead
Loading...