Failure Modes and Edge Cases in Production
Hot Partition Problem
Hot partitions occur when skewed traffic concentrates on a single partition, capping throughput at 1,000-5,000 operations per second and driving tail latencies from 10ms to 500ms+. Common causes: celebrity profiles, trending products, or monotonic keys like timestamps that concentrate recent writes on one partition.
Eventual Consistency Anomalies
Lost read-your-writes occurs when a user updates data but immediately reads from a replica that has not yet received the update. Fix: route reads to the same replica that handled the write (session consistency) or include write timestamps and wait for replicas to catch up. Duplicate unique constraints happen when two concurrent writes create the same username because uniqueness is not globally coordinated. Fix: use conditional writes with version checks (compare-and-set) or a centralized allocator service for critical unique values.
Relational Failure Modes
Long transactions cause lock contention: multiple operations wait for the same rows, degrading throughput. Deadlocks occur when two transactions each hold locks the other needs. Replication lag causes replicas to serve stale reads, especially under write-heavy load. Fixes: keep transactions short, use proper indexes to reduce lock scope, and monitor replication lag to fence stale replicas from serving reads.