Loading...
Design Fundamentals • CAP TheoremMedium⏱️ ~3 min
CP Systems: Choosing Consistency Over Availability
When Correctness Cannot Be Compromised:
CP systems sacrifice availability to maintain linearizability during network partitions. When a majority of replicas becomes unreachable, the system refuses some or all requests rather than risk serving stale data or accepting conflicting writes.
This choice is appropriate when invariants matter: account balances must never go negative, concert tickets cannot be oversold, and distributed locks must guarantee mutual exclusion. These scenarios require that every operation appears to happen atomically in a single global order that all nodes agree on.
How It Works:
CP systems typically use majority quorum protocols. With a replication factor of N=3, you need at least 2 replicas to agree on a write (W=2) and 2 replicas for a read (R=2). Since R + W > N, read and write quorums overlap, guaranteeing that reads see the latest committed write.
Google Spanner exemplifies this approach. It uses Paxos based consensus across replicas spread across multiple zones or regions. For a write in a single region with 3 replicas, you pay roughly 1 to 5 milliseconds for the intra region round trip plus majority quorum. Multi region writes require at least one inter region round trip, typically 50 to 150 milliseconds depending on geographic distance, plus commit wait for TrueTime uncertainty (around 7ms).
⚠️ Common Pitfall: CP systems become unavailable if they lose majority. An ensemble of 5 ZooKeeper nodes can tolerate 2 failures, but losing 3 nodes makes the entire service unavailable until majority is restored. Plan capacity and failure domains carefully.
The Latency Cost:
Every write requires coordination. At minimum, you pay one network round trip to reach majority quorum. Under contention, you may need multiple rounds to resolve conflicts or elect a new leader. Read latency can be reduced by reading from local replicas with bounded staleness, but strongly consistent reads must also contact a quorum.
ZooKeeper demonstrates these trade-offs. It delivers tens of thousands of operations per second with 1 to 10 millisecond latencies within a region. But it is designed for coordination primitives like locks and configuration, not high volume data storage. The consistency guarantee is worth the throughput limitation for critical use cases.
When to Choose CP:
Select CP when correctness is non negotiable. Financial transactions, inventory management with strict capacity limits, unique constraint enforcement (like usernames or order IDs), and distributed coordination (leader election, locks) all require linearizability. The cost is higher write latency, lower throughput under contention, and potential unavailability during majority loss.💡 Key Takeaways
•CP systems use majority quorum (typically W=2, R=2 with N=3 replicas) to ensure read and write quorums overlap, guaranteeing linearizability where every read sees the latest committed write
•Write latency includes at least one network round trip for quorum: 1 to 5ms within a region, 50 to 150ms across regions, plus any coordination overhead like leader election or commit wait
•Availability is sacrificed during majority loss: a 5 node ZooKeeper ensemble becomes unavailable if 3 nodes fail, and Spanner refuses writes if it cannot reach majority quorum
•Google Spanner adds commit wait equal to TrueTime uncertainty (typically 7ms) to ensure external consistency, making multi region writes cost at least one inter region RTT plus 7ms
•Throughput suffers under contention because conflicting writes require multiple coordination rounds, making CP systems better suited for coordination primitives than high volume data paths
📌 Examples
Google Spanner for AdWords: account balances and bids require global consistency. Multi region writes take 50 to 150ms but guarantee no lost updates or negative balances
ZooKeeper for distributed locks: delivers 1 to 10ms latencies within region for tens of thousands of ops per second, but becomes unavailable if majority of 5 node ensemble fails
Banking system: transfer $100 from account A to account B requires CP to prevent duplicate withdrawals or inconsistent balances. Better to reject transfer during partition than allow overdraft
Ticketing system with strict capacity: selling concert tickets requires CP to prevent overselling. If majority replicas unreachable, stop selling rather than risk double booking same seat
Loading...