Database DesignRelational vs NoSQLHard⏱️ ~3 min

Failure Modes and Edge Cases in Production

Hot partitions or hot keys in NoSQL systems cap throughput because skewed traffic to a single partition often hits a ceiling of a few thousand operations per second, driving tail latencies higher. For example, a celebrity user profile or trending product receives disproportionate traffic compared to typical keys. Fixes include better partition key design using hashing or bucketing by time, write sharding where updates fan out to multiple partitions and are aggregated on read, or queueing to smooth bursty traffic. Eventual consistency introduces anomalies like lost read your writes, where a user updates their profile but immediately reads stale data from a replica that has not yet replicated the write. Session consistency or routing reads to the same replica that handled the write mitigates this. Unique constraints are another failure mode: two concurrent writes can create duplicate usernames because uniqueness is not enforced globally in leaderless systems. Solutions include a centralized allocator, conditional writes with version checks, or a separate strongly consistent index service. Cross shard transactions in both relational and NoSQL systems using two phase commit increase latency and reduce availability because coordinator failures trigger long recovery periods. Prefer designing shard local invariants where possible. For workflows requiring coordination across shards, use sagas with compensating transactions that can roll back individual steps without holding locks. Long running transactions and locking in relational databases cause lock contention, deadlocks, and replication lag on hot rows. Use shorter transactions, proper indexing, and break hot rows into partitioned structures like bucketing counters by time window. Multi region write topologies face distinct challenges. Strong global consistency systems like Spanner add higher write latencies (50 to 150 ms across continents) due to consensus; a region outage may require quorum reconfiguration taking seconds. Multi master eventual systems allow write conflicts and divergent histories, requiring deterministic conflict resolution like last write wins with hybrid logical clocks and reconciliation tooling to merge diverged replicas.
💡 Key Takeaways
Hot partitions cap throughput at a few thousand operations per second per partition due to skewed traffic; fix via hashing, time bucketing, write sharding, or queueing to distribute load
Eventual consistency anomalies include lost read your writes (user reads stale replica) and duplicate unique constraints (concurrent writes not globally coordinated); mitigate with session consistency or conditional writes with version checks
Cross shard two phase commit increases latency and reduces availability due to coordinator failures triggering long recovery; prefer shard local invariants or sagas with compensating transactions for multi shard workflows
Write amplification in NoSQL causes one logical write to trigger 5 to 10 physical writes from denormalization, indexes, and replication; compaction stalls under spikes drive tail latencies higher
Multi region strong consistency adds 50 to 150 ms write latency for consensus; multi master eventual systems face write conflicts requiring deterministic resolution like last write wins with hybrid logical clocks
Long transactions in relational systems cause lock contention, deadlocks, and replication lag on hot rows; use shorter transactions, proper indexing, and partition hot structures like counters by time window
📌 Examples
Celebrity user profile hot key in social media: partition key hashed with random suffix to spread reads across multiple partitions, aggregated on client side
Duplicate username creation in leaderless NoSQL: conditional write with compare and set on version field, or separate strongly consistent allocator service for unique constraints
Cross shard order and inventory workflow: saga pattern with compensating transactions for inventory reservation, payment, and shipping steps, each idempotent with retry logic
Facebook TAO graph system: eventual consistency with cache heavy denormalized relationships, precomputed feeds tolerating temporary divergence across regions for low latency at massive scale
← Back to Relational vs NoSQL Overview
Failure Modes and Edge Cases in Production | Relational vs NoSQL - System Overflow