Choosing AP: Availability and Partition Tolerance

Systems that choose AP continue serving requests during network partitions, accepting that different replicas may temporarily diverge and return stale data. Amazon's original Dynamo paper (the foundation for DynamoDB) described a shopping cart system with replication factor 3 and sloppy quorum (commonly read 2, write 2), designed to remain available even when nodes or links fail. The explicit goal was ensuring customers could always add items to their cart, even if it meant occasionally seeing conflicting versions that needed merging.

The trade off is accepting anomalies. With eventual consistency and last writer wins conflict resolution, concurrent updates can disappear: two users adding different items to a cart simultaneously might see one item vanish after replication converges. DynamoDB Global Tables use last writer wins with timestamps, propagating changes across regions in sub second to low seconds typically. This works for shopping carts (merge conflicts via union) and view counters (approximate is fine), but would corrupt bank balances where every cent matters.

Cassandra demonstrates tunable AP behavior. With consistency level ONE, a write succeeds after reaching a single replica, achieving single digit millisecond latencies and maximum availability. The cost is that immediate reads might miss the write entirely if they hit a different replica. Read repair and anti entropy (Merkle tree comparisons) eventually converge replicas, but users can observe anomalies during the window. Netflix uses Cassandra for viewing history and recommendations where availability and low latency (for user facing traffic) matter more than perfect consistency; a missing recommendation or delayed history update is tolerable.

💡 Key Takeaways

•AP systems accept writes on any reachable replica during partitions, achieving single digit millisecond latencies and maximum uptime at the cost of temporary divergence and stale reads

•Last writer wins conflict resolution (common in DynamoDB Global Tables) can silently drop concurrent updates; use conflict free replicated data types (CRDTs) for counters, sets, and maps to avoid lost updates

•Amazon Dynamo targeted p99.9 latencies under a few hundred milliseconds even during failures, prioritizing shopping cart availability over consistency; modern DynamoDB achieves single digit ms for eventually consistent reads

•Cassandra consistency level ONE maximizes availability and minimizes latency but requires read repair, hinted handoff, and anti entropy to converge; expect stale reads and write conflicts

•Use AP for feeds, timelines, carts, analytics, view counts, and recommendation systems where approximate or slightly stale data is acceptable and availability drives user experience

📌 Examples

Amazon shopping cart uses AP with vector clocks or timestamps; if two users add different items during a partition, the system merges both (union of items) rather than rejecting one write

Netflix viewing history stored in Cassandra tolerates stale reads; if a user watches episode 5 but another replica shows only through episode 4 for a few seconds, the UX remains acceptable

Social media like counts with Cassandra counters can drift during partitions (one region sees 100 likes, another 102), converging eventually; perfect accuracy is not required for non critical metrics

DynamoDB Global Tables replicate writes across regions asynchronously with last writer wins; a profile update in US East might take 500ms to appear in EU West, but both regions remain writable throughout

← Back to CAP Theorem Overview