Write Back: Trading Durability for Speed
Inverting the Persistence Order
Write back (also called write behind) inverts the persistence order to optimize write latency. The application writes to the cache, which immediately acknowledges the write. The cache then persists to the database asynchronously via a buffer or queue, often batching and compacting multiple updates. This delivers the fastest possible write latency since the application only waits for the in memory cache update to complete, not for the database round trip.
Why Write Back is Dramatically Faster
A typical database write takes 5-20ms: network round trip, disk I/O, transaction commit, and replication acknowledgment. A distributed cache write takes 0.5-2ms: network round trip and memory update. Write back returns after the cache write, reducing perceived latency by 80-95%. But speed is not the only benefit. Write batching dramatically reduces database load. Consider a counter receiving 10,000 increments per second. With immediate writes, the database handles 10,000 write operations per second. With write back batching on a 1 second interval, those 10,000 increments compact into a single "+10,000" update. Database write load drops from 10,000 ops/sec to 1.
The Durability Risk
Here is the critical trade-off: if a cache node fails before flushing its write buffer, those writes are lost. Period. The data loss window equals your flush interval. Flush every 1 second and you lose at most 1 second of writes on node failure. Flush every 10 seconds and you might lose 10 seconds. Write reordering is another hazard: if updates flush out of order, an older state can overwrite a newer one. Thread A updates value to 5, Thread B updates to 10; due to network timing, the update with value 5 reaches the database after value 10, leaving the database at 5 when it should be 10.
Making Write Back Production Ready
Production implementations add durability layers. The cache maintains a WAL (Write Ahead Log, a sequential log that records write intentions to disk before acknowledging) that persists write intentions before acknowledging. If the node crashes, it replays the WAL on restart, deduplicating by idempotency keys (unique identifiers that allow detecting duplicate operations). Some systems replicate the write buffer across multiple nodes before acknowledging. Strict flush policies bound the data loss window: maximum time in queue (1-5 seconds typical), maximum queue depth, and backpressure thresholds that slow writers when the queue grows.
When to Use Write Back
Write back is ideal for append only or merge friendly data: metrics and counters where loss of a few data points is acceptable, logs where missing entries do not break consistency, and high volume event streams where batching provides massive efficiency gains. It should never be used for user transactional data like orders, payments, or inventory where loss has direct business impact.