Availability Prioritized (AP) vs Consistency Prioritized (CP) Wide-Column Systems

Availability Prioritized Systems
AP (Availability Prioritized) systems remain available during network partitions by letting any replica accept writes independently. Reads and writes use tunable consistency levels per operation. With replication factor 3 (data copied to 3 nodes), setting both read and write to QUORUM (majority, so 2 of 3) ensures immediate consistency since 2 + 2 > 3. LOCAL_QUORUM limits acknowledgment to the local datacenter, giving 5-10ms latency while surviving datacenter failures.
The tradeoff: no multi-row transactions. Concurrent writes to the same key resolve via "last write wins" using timestamps, requiring tight NTP (Network Time Protocol) synchronization. Clock skew beyond 100ms can cause updates to appear out of order, losing data.
Consistency Prioritized Systems
CP (Consistency Prioritized) systems route all writes for a region to a single leader that serializes operations. This provides linearizable reads (always see latest write) and strong per-row consistency. If the leader becomes unreachable, writes stall until a new leader is elected, sacrificing availability for correctness.
Leader election requires coordination, often using a consensus system like ZooKeeper (a coordination service maintaining configuration and detecting failures). Failover takes 30-120 seconds: heartbeat timeout (30s), master reassigns region (10-60s), and WAL (Write-Ahead Log) replay from distributed storage (5-30s).
When to Choose Each Model
Choose AP when always-on operation matters more than per-row linearizability. Activity feeds, metrics, and session stores tolerate eventual consistency. Multi-datacenter deployments handle trillions of operations daily at sub-20ms p99 with LOCAL_QUORUM.
Choose CP when per-row correctness is paramount: message ordering in an inbox, counter semantics, or inventory decrements. CP systems integrate well with batch analytics via HDFS (Hadoop Distributed File System), though they accept unavailability during leader transitions.
Key Trade-off: AP systems trade consistency guarantees for availability (always accept writes). CP systems trade availability for consistency (stall writes when leader unreachable). Neither is "better" - choose based on your correctness requirements.

💡 Key Takeaways

✓AP systems use tunable consistency: QUORUM read + QUORUM write with RF=3 ensures immediate consistency (2+2>3); LOCAL_QUORUM gives 5-10ms latency

✓Last-write-wins in AP requires NTP synchronization under 100ms; clock skew causes updates to resolve incorrectly, losing data

✓CP systems route all writes to a single leader providing linearizable reads and per-row transactions, but writes stall if leader unreachable

✓CP failover takes 30-120 seconds: heartbeat timeout, region reassignment, WAL replay from distributed storage

✓Choose AP for always-on requirements (metrics, sessions, feeds) tolerating eventual consistency

✓Choose CP when per-row correctness is paramount (message ordering, counters, inventory) accepting leader unavailability

📌 Interview Tips

1Explain tunable consistency with math: RF=3, QUORUM read (2) + QUORUM write (2) = 4 > 3 means at least one node has latest data. This shows understanding beyond buzzwords.

2Describe CP failover timeline: 30s heartbeat timeout + 30s region reassignment + 15s WAL replay = 75s total unavailability. Quantify the consistency-availability tradeoff.

3Compare clock skew impact: AP with 200ms skew loses updates when timestamps reverse causality. CP avoids this via single leader but has failover unavailability.

← Back to Wide-Column Stores (Cassandra, HBase) Overview