Database DesignACID vs BASEHard⏱️ ~3 min

Failure Modes: When ACID and BASE Break Down

Even well designed systems have edge cases where their consistency model fails. ACID systems using snapshot isolation (common in PostgreSQL, SQL Server) are vulnerable to write skew: two concurrent transactions both read the same snapshot, pass validation checks, then both commit, violating an invariant. Classic example: two transactions each check that at least one doctor is on call, both see two on call, both set one to off call, leaving zero on call. This requires serializable isolation or explicit locking, adding 20 to 50 percent latency overhead in high contention workloads. BASE systems face different demons. Last writer wins conflict resolution relies on timestamps, and clock skew between replicas can cause a chronologically later write to be overwritten by an "older" one with a skewed later timestamp. DynamoDB Global Tables suffer this: if us-east-1 clock is 2 seconds ahead of eu-west-1, a write at T+2 in EU can be dropped in favor of a write at T in US. Mitigation requires hybrid logical clocks or vector clocks, which DynamoDB does not expose. The workaround is application level versioning with conditional writes, checking an explicit version attribute. Monotonic read violations in eventual consistency confuse users. A client writes profile data, reads it back successfully (hits a fresh replica), then reads again and sees old data (hits a stale replica). This "time travel" breaks user trust. Session consistency solves this by pinning reads to replicas at or ahead of a session token, but requires sticky routing and fails over less gracefully. Cosmos DB session consistency tokens must be passed from write responses to subsequent reads, adding application complexity. Hot partitions silently degrade both models. A viral product in DynamoDB can hit the approximately 1,000 write per second partition limit, causing throttling and pushing P99 from 10 ms to 200+ ms. ACID systems also suffer: a hot row under pessimistic locking creates lock queues, spiking latencies. Aurora MVCC helps by allowing reads without locks, but write heavy hot rows still serialize updates. Solutions include key salting (turning one hot key into many) or escrow (pre allocating capacity to avoid coordination), both requiring application redesign.
💡 Key Takeaways
Write skew in snapshot isolation allows two transactions to both pass checks and commit, violating invariants like overbooking or zero on call staff. Serializable isolation prevents this but adds 20 to 50 percent latency overhead in high contention scenarios.
Clock skew with last writer wins can permanently lose writes. If one replica clock is 2 seconds ahead, a write at real time T+2 can be dropped in favor of T+0 write with skewed timestamp, with no application visibility into the loss.
Monotonic read violations show users older data after seeing newer data, breaking trust. Session consistency fixes this at cost of sticky routing and less graceful failover, requiring token propagation in application code.
Hot partitions hit throughput ceilings (approximately 1,000 writes per second in DynamoDB, variable in ACID under lock contention), spiking P99 from 10 ms to 200+ ms and requiring key salting or escrow redesign.
Split brain in multi leader BASE during partitions accepts divergent writes that reconciliation may drop (LWW) or merge incorrectly, with typical Global Tables replication lag under 1 second but no hard upper bound during prolonged partitions.
📌 Examples
Write skew in seat booking: Two users both see 1 seat left on snapshot, both pass 'available > 0' check, both decrement and commit. Result: 2 seats sold, count at negative 1. Fix: UPDATE seats SET count = count minus 1 WHERE id = 123 AND count > 0; with serializable isolation or SELECT FOR UPDATE
Clock skew loss in DynamoDB Global Tables: User writes profile in eu-west-1 at 10:00:02 UTC (replica clock at 10:00:00), then writes again in us-east-1 at 10:00:01 real time (replica clock at 10:00:03). LWW keeps 10:00:03 write, silently dropping 10:00:02 update.
Hot product in DynamoDB: Viral tweet drives 5,000 writes per second to single product key. Single partition ceiling at approximately 1,000 writes per second causes throttling. Fix: Shard key into product_id_0 through product_id_4 (write sharding), aggregate reads from all shards, accept complexity.
← Back to ACID vs BASE Overview
Failure Modes: When ACID and BASE Break Down | ACID vs BASE - System Overflow