Multi-Region Replication and Failure Modes
Active-Active Multi-Region
Multi-datacenter deployments replicate data across geographic regions for low-latency local reads and disaster recovery. In active-active topology, any region accepts writes with asynchronous replication to others. LOCAL_QUORUM (majority of local replicas) achieves 5-15ms write latency while surviving entire datacenter failures.
Storage overhead is significant. With replication factor 3 (RF=3, three copies per region) across 3 datacenters, each logical write becomes 9 physical writes. Total storage is 9x logical data plus 30% compaction headroom, roughly 12x total. Cross-region replication completes asynchronously in 50-200ms depending on WAN (Wide Area Network) distance.
Failure Modes
Split brain: Network partition isolates datacenters. Both accept writes independently using last-write-wins. When partition heals, conflicting writes resolve by timestamp, potentially losing updates if clocks skewed.
Replica lag: Asynchronous replication can fall 10+ seconds behind during peak load. Users read from local replica immediately after writing, see stale data (read-your-writes violation).
Cascading failure: One region fails, traffic shifts to remaining regions. Without 50-100% extra capacity per region, survivors overload, p99 spikes, timeouts cascade.
Anti-Entropy Repair
Replicas can diverge due to failed writes, network issues, or partial failures. Anti-entropy repair periodically compares Merkle trees (hash trees summarizing data segments) across replicas and streams differences. Full repair of multi-terabyte nodes takes hours to days, saturating network and disk. Throttle streams to 50-100 MB/s and run incremental repairs weekly to limit impact on foreground queries.