Write Heavy Optimization: Buffering and Distribution

Write Optimization Principles
Write-heavy systems prioritize ingestion throughput over read latency. The core principle is absorbing writes as quickly as possible, then reorganizing data asynchronously for efficient reads. This inverts read-optimized design where you pay upfront cost during writes to serve reads faster.
Write-Ahead Logging (WAL)
WAL is fundamental to write optimization. Instead of modifying data in-place (random I/O), append the change to a sequential log file. Sequential writes to SSDs or spinning disks are 10-100x faster than random writes. The database periodically applies logged changes to actual data pages in the background.
WAL provides durability guarantees: once data is written to the log and fsynced (a filesystem operation that forces data from OS buffers to physical disk), it survives crashes. The database can replay the log on restart to recover committed transactions. This decouples durability from data organization.
LSM Trees for Write Throughput
Log-Structured Merge (LSM) trees optimize for write-heavy workloads. Incoming writes go to an in-memory buffer (memtable). When full, the memtable flushes to disk as an immutable sorted file (SSTable). Background compaction merges SSTables to maintain read performance and reclaim space from deleted/updated keys.
This design accepts slower reads (must check multiple SSTables) in exchange for fast writes. Read amplification (checking multiple files) is the tradeoff for write amplification reduction. LSM trees are used by key-value stores, message queues, and time-series databases handling millions of writes per second.
Batching and Buffering
Batching amortizes per-write overhead across multiple operations. Instead of 1,000 individual inserts with 1,000 network round-trips and fsync calls, batch them into one operation. This reduces network latency, disk I/O, and transaction overhead.
Client-side buffering accumulates writes before sending. Server-side buffering (like write-back caching) acknowledges writes immediately while persisting asynchronously. The tradeoff is durability risk: buffered data may be lost in crashes. Configure buffer sizes and flush intervals based on acceptable data loss windows.

💡 Key Takeaways

✓Batching writes in 1 to 10ms windows or 1K to 10K item groups achieves 5x to 20x throughput by reducing syscalls and fsyncs, with small p50 latency cost

✓Sharding by high cardinality keys distributes load: LinkedIn Kafka uses partitioning to handle tens of millions of messages per second across distributed brokers

✓Hot key mitigation: sharded counters split celebrity or viral item load across multiple keys, preventing single shard saturation that causes p99 latency spikes from 10ms to 500ms+

✓Durable logs decouple producers from consumers: writes acknowledge in milliseconds while consumers process asynchronously, preventing slow readers from blocking writers

✓Consumer lag monitoring critical: during spikes, lag can grow to minutes or hours, increasing staleness in read models and requiring autoscaling or backpressure

📌 Interview Tips

1LinkedIn Kafka transports trillions of messages daily with tens of millions per second at peak, batching aggressively and using partitioned topics for parallel consumption

2Twitter handles tens of thousands of tweets per second with fanout systems that queue writes and process asynchronously to avoid blocking tweet creation on delivery

3Meta uses sharded counters for viral posts to prevent single key hotspots, distributing increments across multiple counter shards merged periodically

← Back to Read-Heavy vs Write-Heavy Optimization Overview