Replication & Consistency • Multi-Leader ReplicationMedium⏱️ ~3 min
Conflict Detection and Resolution Strategies
The core technical challenge in multi leader replication is achieving convergence under concurrency. When two leaders accept conflicting writes to the same record before hearing about each other's updates, the system must detect the conflict and deterministically resolve it so all replicas eventually converge to identical state. Conflict detection happens when applying a remote write whose causality marker (version metadata like vector clocks or hybrid logical clocks) does not descend from the local version, indicating concurrent rather than sequential updates. Without proper detection and resolution, replicas can permanently diverge or lose user intent.
The simplest resolution strategy is last writer wins (LWW) based on timestamps, used by AWS DynamoDB Global Tables. Each write carries a service generated timestamp, and the write with the higher timestamp wins during conflicts. This approach is operationally simple and deterministic but can drop legitimate updates under clock skew or true concurrency. For example, if a Tokyo leader's clock is 2 seconds ahead of London's, Tokyo writes will always win even if London's update was logically newer. More sophisticated systems use hybrid logical clocks (HLC) that combine physical time with logical counters to bound skew, or vector clocks that track causality per origin. Amazon Dynamo retail used vector clocks with application level merge logic: concurrent shopping cart updates would preserve both versions and merge them by taking the union of items.
For specialized data types, Conflict free Replicated Data Types (CRDTs) provide mathematically proven convergence. A grow only counter (G-Counter) used for page view counts never conflicts because each leader increments its own counter and reads sum across all leaders. An observed remove set (OR-Set) for collaborative todo lists ensures that concurrent additions always preserve items even if another leader concurrently removes the old set. Operational Transformation for collaborative editing (Google Docs style systems) transforms incoming operations based on concurrent operations to preserve user intent: if user A types "hello" at position 0 while user B types "world" at position 0, OT ensures both strings appear in deterministic order. The tradeoff is complexity: CRDTs and OT require specialized implementations per data type and can consume more metadata storage.
💡 Key Takeaways
•Last writer wins (LWW) with timestamps is operationally simplest but can drop legitimate concurrent updates; clock skew of even 1 to 2 seconds can cause newer logical writes to lose to older ones with skewed timestamps
•Hybrid logical clocks (HLC) combine physical time with logical counters to provide timestamp ordering that bounds skew and captures causality, suitable for systems needing timestamp based ordering with better conflict semantics than pure physical clocks
•Vector clocks track per origin sequence numbers to detect true concurrency; Amazon Dynamo used them to preserve all concurrent shopping cart versions and merge at application level by taking union of items
•Conflict free Replicated Data Types (CRDTs) mathematically guarantee convergence: grow only counters for metrics, observed remove sets for collaborative lists, last writer wins registers for key value cells, each with specific memory overhead for metadata
•Operational Transformation for collaborative text editing transforms concurrent operations (insert, delete at position) based on what has already been applied, preserving user intent with sub 100ms local echo and 70ms to 200ms cross continent convergence
•Resolution must be commutative, associative, and idempotent to ensure replicas converge to identical state regardless of operation arrival order; failure to meet these properties causes permanent divergence
📌 Examples
AWS DynamoDB Global Tables uses LWW with microsecond precision service timestamps plus leader ID tiebreaker; guidance recommends conditional writes with version checks for critical fields to avoid blind overwrites losing business logic
Amazon Dynamo retail shopping cart conflict: user adds item X in us-east datacenter (version [A:5, B:3]) while datacenter failure causes add of item Y in us-west (version [A:4, B:4]); system detects concurrent versions and application merges to cart containing both X and Y
Redis CRDT for active active geo replication: increment counter operations from different datacenters are commutative (Tokyo +5, London +3 always sum to +8); set union operations preserve all added members; last writer wins registers use timestamp plus replica ID for deterministic resolution