Replication & Consistency • Consistency ModelsHard⏱️ ~3 min
Consistency Model Failure Modes and Production Hardening
Even well designed consistency models degrade or fail under specific conditions that must be anticipated in production systems. Session guarantees (read your writes, monotonic reads) break under non sticky routing: if a client writes to replica A and a subsequent read is load balanced to replica B, which has not yet received the replication update, the client observes time travel. This is particularly insidious because it happens silently without errors. Defense requires sticky sessions at the load balancer (pinning clients to specific replicas or regions), session tokens carried with requests and validated by replicas (rejecting requests that require unavailable versions), or version vectors that route requests only to replicas meeting minimum version requirements. Azure Cosmos DB uses session tokens; Facebook TAO uses regional affinity with read after write guarantees enforced within a region.
Clock skew undermines systems relying on bounded uncertainty for strong semantics. Google Spanner's TrueTime API provides an epsilon interval within which true time is guaranteed to lie, typically around 7 milliseconds. Commit wait equals epsilon to ensure linearizability: the system waits until the commit timestamp is guaranteed to be in the past for all observers. If GPS or Precision Time Protocol (PTP) signals are lost or degraded, epsilon grows, increasing commit latency proportionally. If epsilon becomes unbounded (complete time synchronization failure), the system cannot safely maintain linearizability guarantees and must degrade or reject writes. Production Spanner deployments use diverse time sources (GPS, atomic clocks) and monitor epsilon continuously; SLAs degrade when epsilon exceeds thresholds.
Split brain and stale lease reads occur in leader based systems when network partitions allow an old leader to continue serving requests after a new leader has been elected. Without proper fencing (using monotonically increasing epoch numbers or generation tokens), the old leader may accept writes that conflict with the new leader's writes, violating linearizability. Reads from a stale leader return outdated data even though the system believes it is providing strong consistency. Defense requires fencing tokens: each leader obtains a generation number from a consensus system (like ZooKeeper), and replicas reject operations with stale generation numbers. Lease based systems must ensure lease expiration is strictly enforced; a leader whose lease has expired must immediately stop serving requests, even if it has not detected the partition. Hot keys under quorum systems concentrate load: a popular key with W=QUORUM requires majority replicas to acknowledge every write, creating a bottleneck. Tail latency spikes when any replica in the quorum slows down, reducing effective availability even though all nodes are up. Mitigation includes request hedging (sending duplicate requests after a timeout), replica placement to avoid shared failure domains, and sharding hot keys across multiple logical keys with client side aggregation.
💡 Key Takeaways
•Session guarantees fail silently under non sticky routing when clients are load balanced to lagging replicas; defense requires sticky sessions, session tokens validated by replicas, or version based routing
•Clock skew breaks bounded uncertainty systems like Google Spanner: if TrueTime epsilon grows from 7 ms to unbounded due to GPS or Precision Time Protocol (PTP) loss, commit latency increases and linearizability guarantees degrade
•Split brain in leader based systems allows old leaders to serve stale reads or accept conflicting writes after partition; fencing tokens (monotonic epoch numbers) prevent this by having replicas reject stale epochs
•Hot keys under quorum systems concentrate load on majority replicas; tail latency spikes when any quorum member slows reduce effective availability even with all nodes operational
•Secondary index inconsistencies arise when primary and index updates are not atomically coordinated; queries return dangling references or missing results, requiring read repair or background reindexing
•Uniqueness constraints and idempotency cannot be safely enforced under eventual or causal consistency without coordination; use reservation tokens, escrow techniques, or convert to idempotent operations with deduplication
📌 Examples
Azure Cosmos DB session tokens: after writing document D with session token {region: eastus, lsn: 5000}, client sends token with next read; if routed to replica at lsn: 4800, replica delays or rejects until replication catches upGoogle Spanner TrueTime monitoring: if epsilon exceeds 10 ms (due to time sync degradation), commit wait increases proportionally; at 50 ms epsilon, write latency jumps from tens of ms to 100+ ms within a region
Etcd fencing: each leader obtains a monotonic lease ID from Raft; replicas store highest seen lease ID and reject writes with lower ID, preventing split brain when old leader is partitioned
DynamoDB hot partition: celebrity user key receives 10000 writes per second; with W=2, N=3, two replicas handle all writes, hitting throughput limits and causing throttling even though total cluster capacity is available