Write Back: Trading Durability for Speed

Inverting the Persistence Order
Write back (also called write behind) inverts the persistence order to optimize write latency. The application writes to the cache, which immediately acknowledges the write. The cache then persists to the database asynchronously via a buffer or queue, often batching and compacting multiple updates. This delivers the fastest possible write latency since the application only waits for the in memory cache update to complete, not for the database round trip.
Why Write Back is Dramatically Faster
A typical database write takes 5-20ms: network round trip, disk I/O, transaction commit, and replication acknowledgment. A distributed cache write takes 0.5-2ms: network round trip and memory update. Write back returns after the cache write, reducing perceived latency by 80-95%. But speed is not the only benefit. Write batching dramatically reduces database load. Consider a counter receiving 10,000 increments per second. With immediate writes, the database handles 10,000 write operations per second. With write back batching on a 1 second interval, those 10,000 increments compact into a single "+10,000" update. Database write load drops from 10,000 ops/sec to 1.
The Durability Risk
Here is the critical trade-off: if a cache node fails before flushing its write buffer, those writes are lost. Period. The data loss window equals your flush interval. Flush every 1 second and you lose at most 1 second of writes on node failure. Flush every 10 seconds and you might lose 10 seconds. Write reordering is another hazard: if updates flush out of order, an older state can overwrite a newer one. Thread A updates value to 5, Thread B updates to 10; due to network timing, the update with value 5 reaches the database after value 10, leaving the database at 5 when it should be 10.
Making Write Back Production Ready
Production implementations add durability layers. The cache maintains a WAL (Write Ahead Log, a sequential log that records write intentions to disk before acknowledging) that persists write intentions before acknowledging. If the node crashes, it replays the WAL on restart, deduplicating by idempotency keys (unique identifiers that allow detecting duplicate operations). Some systems replicate the write buffer across multiple nodes before acknowledging. Strict flush policies bound the data loss window: maximum time in queue (1-5 seconds typical), maximum queue depth, and backpressure thresholds that slow writers when the queue grows.
When to Use Write Back
Write back is ideal for append only or merge friendly data: metrics and counters where loss of a few data points is acceptable, logs where missing entries do not break consistency, and high volume event streams where batching provides massive efficiency gains. It should never be used for user transactional data like orders, payments, or inventory where loss has direct business impact.
Key Trade-off: Write back offers 80-95% latency reduction and massive write batching efficiency, but at the cost of potential data loss on node failure. The flush interval directly determines your maximum data loss window.

💡 Key Takeaways

✓Writes go to cache first (0.5-2ms), database updates happen asynchronously, reducing perceived write latency by 80-95%

✓Batching transforms 10,000 individual increments per second into a single database update, reducing write load by 10,000x

✓Data loss risk: cache node failure before flush loses all buffered writes; loss window equals flush interval (1-10 seconds typical)

✓Production systems add WAL (Write Ahead Log), buffer replication, and versioned writes to limit data loss

✓Writes must be idempotent or version stamped to handle duplicate replay after crash recovery

✓Use for metrics, counters, and event streams; never for financial transactions or inventory where loss has business impact

📌 Interview Tips

1Calculate latency savings: database write 15ms, cache write 1ms; write back returns in 1ms, saving 93% latency per write

2Batching example: 10,000 counter increments/sec batched over 5 seconds = 50,000 increments in single UPDATE...SET value = value + 50000

3Durability pattern: on write, append to WAL with sequence number, update cache, ACK; on crash, replay WAL skipping already applied (via idempotency table)

← Back to Cache Patterns (Aside, Through, Back) Overview