Distributed Systems Primitives • Consensus Algorithms (Raft, Paxos)Easy⏱️ ~2 min
What is Quorum Consensus and How Does It Provide Safety?
Quorum consensus is a fundamental mechanism that allows a set of unreliable nodes to agree on a single sequence of decisions, forming the basis for state machine replication in distributed systems. The core requirement is maintaining 2f+1 replicas to tolerate f failures, meaning any two quorums (majorities) will overlap in at least one node that remembers prior decisions. This overlap property guarantees that new leaders can always discover previously committed values, preventing conflicting decisions from being committed.
The two critical properties are safety, which ensures the system never commits conflicting decisions even during network partitions or node failures, and liveness, which ensures the system eventually makes progress when a majority is available. Safety is absolute and never violated, while liveness can be temporarily suspended during partitions where no majority exists. This design choice reflects the CAP theorem: consensus systems prefer consistency and partition tolerance over availability during splits.
In production deployments, the choice of replica count directly impacts both availability and performance. A 3 node cluster can tolerate 1 failure and commits with 2 acknowledgments, typically achieving 2 to 6 milliseconds p50 latency within a single availability zone with NVMe storage and sub 1 millisecond round trip times. A 5 node cluster tolerates 2 failures but requires 3 acknowledgments, increasing commit latency to 5 to 15 milliseconds across zones with 1 to 2 milliseconds round trip times. Google Spanner uses 5 replicas per Paxos group across 3 or more regions, accepting 50 to 200 milliseconds commit latency for stronger durability guarantees.
💡 Key Takeaways
•Quorum systems require 2f+1 replicas to tolerate f failures, ensuring any two majorities overlap in at least one node that preserves prior decisions
•Safety guarantees no conflicting decisions are ever committed, while liveness ensures progress when a majority is available, but writes halt during majority loss by design
•3 node clusters achieve 2 to 6 milliseconds p50 commit latency in single availability zones, while 5 node clusters across zones see 5 to 15 milliseconds due to additional round trip times and acknowledgments
•Losing f+1 nodes in a 2f+1 cluster immediately halts all writes, commonly occurring during mismanaged reconfigurations where nodes are removed before replacements are added
•Cross region deployments with 5 replicas incur 50 to 200 milliseconds write latency due to intercontinental round trip times, plus continuous egress costs of tens to hundreds of dollars daily at 10 megabytes per second sustained replication
•Quorum overlap guarantees linearizability for writes, but this comes at the cost of unavailability during network partitions, making consensus unsuitable for scenarios requiring writes in minority partitions
📌 Examples
Google Chubby uses 5 replica Multi Paxos groups with one master handling operations. Lock and metadata operations complete in tens of milliseconds within a datacenter, but failover requires election and lease expiration taking seconds to avoid split brain scenarios.
Kubernetes etcd deployments typically use 3 or 5 members in a single region to minimize round trip times. With NVMe storage and 1 to 2 milliseconds network latency, steady state writes commit at 5 to 25 milliseconds p50, with p99 under 100 milliseconds when properly tuned. Cross region etcd is explicitly avoided because higher round trip times inflate commit latency and destabilize elections.
A 3 node cluster losing 2 nodes immediately stops accepting writes even though 1 node remains operational. This is by design: accepting writes from the minority would violate safety if the other 2 nodes formed their own majority with different data.