Tunable Consistency: Mixing CP and AP in Production

The Hybrid Reality:

Most production systems do not rigidly choose CP or AP for everything. Instead, they offer tunable consistency where you select guarantees per operation, per table, or per data type. This lets you make critical paths CP (inventory decrements, payments) and peripheral paths AP (view counts, recommendations), optimizing for both correctness where it matters and latency where it does not.

The key insight: not all data needs the same guarantees. A banking transaction requires linearizability. A profile view counter tolerates eventual consistency. By mixing models, you avoid paying coordination costs everywhere while still protecting invariants that matter.

Quorum Tuning in Practice:

With replication factor N and quorums R (read) and W (write), strong consistency holds when R + W > N. Common configurations:

N=3, R=2, W=2: Strongly consistent if both quorums reachable. Single digit milliseconds within region, but tail latency grows with slowest replica. Under partition, either refuse queries (CP) or serve at lower consistency level (AP).

N=3, R=1, W=1: Fastest possible, often under 5ms p99 within region. Eventual consistency only. Stale reads common, especially during high write rates or when replicas lag.

N=5, R=3, W=3: Higher durability and availability, tolerates 2 failures while maintaining strong consistency. But now tail latency is bottlenecked by third slowest replica, pushing p99 to 10 to 20ms under load.

⚠️ Common Pitfall: Mixing consistency levels creates read after write anomalies. User submits order at QUORUM (strong), then immediately queries order history at ONE (eventual). If reads hit stale replica, order appears missing. Solution: session tokens or sticky routing to ensure monotonic reads.
Session Consistency Guarantees:

Full linearizability is expensive, but many user facing flows only need session level guarantees:

Read your writes: after updating profile, user must see their own update. Pin user session to a replica or attach version token to requests.

Monotonic reads: once user sees version V, never show older version V minus 1. Track session read version and reject stale replicas.

Monotonic writes: user's writes must be applied in order. Route all writes from session through same coordinator.

These weaker models avoid full quorum coordination on reads while preventing the most glaring anomalies. Typical overhead: session token in cookie or header, plus replica selection logic.

Geo Distribution and Regional Quorums:

Cross region strong consistency is brutally expensive. Inter region RTT is 50 to 150ms. Writing to a cross region quorum means every write takes at least 100ms, breaking user facing SLOs.

Common pattern: confine strong writes to regional scope. Accept writes in US East at QUORUM within 3 US East replicas (5ms latency). Asynchronously replicate to EU West (800ms propagation). EU reads get eventual consistency unless they specifically request strong reads from US East (adding 100ms round trip).

This hybrid delivers single digit millisecond writes for most users while allowing EU users to query local replicas with bounded staleness. The trade-off: EU users may see data 1 to 2 seconds stale unless they wait for cross region round trip.

Cost and Failure Mode Implications:

Every additional replica multiplies storage and bandwidth costs. N=3 means 3x storage, 3x write bandwidth, and potentially 3x egress charges. Increasing replication factor to N=5 for higher durability adds 67% more cost.

Quorum size affects blast radius. With R=3, W=3, N=5, losing 2 replicas still maintains strong consistency. With R=2, W=2, N=3, losing 2 replicas makes data unavailable. Plan failure domains (spread across racks, AZs) and monitor replica health actively.

Hot partitions amplify quorum overhead. If 80% of traffic hits one shard and you require quorum, tail latency on that shard dominates overall p99. Solutions: adaptive replica selection (skip slow replicas), hedged requests (send duplicate to backup replica after timeout), or eventually consistent reads for hot paths.

💡 Key Takeaways

•Tunable consistency lets you choose guarantees per operation: N=3 with R=2, W=2 gives strong consistency in 5 to 10ms p99, while R=1, W=1 gives eventual consistency in 2 to 3ms p99, allowing critical paths to be CP and peripheral paths AP

•Session level guarantees like read your writes cost less than full linearizability: pin client to replica or attach version tokens, adding only session tracking overhead while avoiding full quorum reads

•Cross region strong consistency is expensive: inter region RTT of 50 to 150ms means cross region quorum writes take over 100ms, so confine strong writes to regional scope and replicate asynchronously across regions

•Replication factor directly multiplies costs: N=3 means 3x storage and write bandwidth, N=5 adds 67% more cost, with each additional replica also increasing tail latency risk as you wait for more nodes

•Hot partitions amplify quorum overhead: if 80% of traffic hits one shard requiring quorum, tail latency on that shard dominates overall p99, requiring adaptive replica selection or hedged requests to mitigate

📌 Examples

E-commerce checkout: write order to DynamoDB at QUORUM (2/3 replicas, 8ms latency) for strong consistency, but read product catalog at ONE (2ms latency, eventual) since stale product descriptions are acceptable

Banking app with session consistency: user transfers $100 and immediately refreshes. Session token ensures their next read hits replica with version >= transfer version, preventing confusing 'transfer not found' error

Multi region social app: user posts in US East at QUORUM within region (5ms), asynchronously replicates to EU West (800ms). EU followers see post with 1 to 2 second delay, acceptable for timeline freshness

High durability logs: store audit logs with N=5, R=3, W=3 to tolerate 2 simultaneous failures. Accept 15ms p99 write latency and 67% higher storage cost for critical compliance data

← Back to CAP Theorem Overview