Learn→Data Storage Formats & Optimization→Compression Algorithms Trade-offs→3 of 5

Data Storage Formats & Optimization • Compression Algorithms Trade-offsMedium⏱️ ~3 min

Production Compression in Data Pipelines: Layered Optimization

The Multi Stage Reality: In production systems, compression isn't a single decision. It's applied at multiple stages of the data flow, each with different optimization goals. Understanding where to compress and with which codec at each layer is critical to achieving both performance and cost efficiency.

Consider a large scale logging pipeline that processes 5 million events per second, about 5 GB/s raw. Data flows through four stages: producer, message bus, storage, and analytics query engine. Each stage compresses differently.

Stage One: Producer to Message Bus: At this layer, primary concerns are throughput and p99 latency. If each producer instance can spare only 20 to 30 percent CPU for compression, you choose fast codecs like Snappy or LZ4. These compress at hundreds of MB/s per core with ratios around 2x, keeping per request latency under 1 millisecond at p99. This halves network traffic between producers and brokers, potentially reducing link requirements from 40 Gbit to 20 Gbit.

Network Impact at Producer Layer
40 Gbit
WITHOUT COMPRESSION
20 Gbit
WITH SNAPPY
Stage Two: Long Term Storage: For object storage similar to S3, optimization shifts toward cost per terabyte and scan efficiency. A trillion events daily at 200 bytes each equals roughly 200 TB raw. With zlib or Zstd at 3x to 4x ratio, you store only 50 to 70 TB. When replicated three times and stored for a year, the cost delta compared to Snappy at 2x becomes massive.

Stage Three: Analytics Query Layer: Engines like Presto or Spark SQL read compressed blocks from columnar formats, then decompress in memory. The key metric here is effective scan throughput: IO bandwidth divided by compression ratio, minus CPU cost. If storage reads at 10 GB/s and Zstd achieves 4x ratio, you effectively scan 40 GB/s of logical data, assuming decompression stays under 30 to 40 percent CPU.

Edge and API Layer: User facing services compress JSON or HTML to reduce bandwidth and improve page load time. Static assets might use zlib level 6 (compressed once, decompressed millions of times), while API responses use Zstd at mid levels to keep server side latency below 5 milliseconds at p99.

✓ In Practice: Facebook's Zstd deployment uses higher compression levels for cold storage and lower levels where latency matters, gaining 10 to 15 percent smaller payloads at similar speed or 3 to 5 times faster compression at similar size versus zlib.

The pattern is clear: compress fast and light where latency is critical, compress aggressively where data is written once and read many times, and match codec characteristics to workload demands at each pipeline stage.

💡 Key Takeaways

✓Production pipelines compress at multiple stages with different codecs: fast at producers (Snappy/LZ4), aggressive at storage (Zstd/zlib), optimized for queries

✓Producer layer uses Snappy with 2x ratio to halve network traffic from 40 Gbit to 20 Gbit while keeping p99 latency under 1 ms

✓Storage layer uses Zstd at 3x to 4x ratio, turning 200 TB daily raw into 50 to 70 TB, saving massive costs when replicated and retained long term

✓Analytics queries achieve effective scan throughput by dividing IO bandwidth by compression ratio: 10 GB/s storage with 4x compression yields 40 GB/s logical throughput

✓Edge services compress static assets with high ratios (zlib level 6) and API responses with fast codecs (Zstd mid level) to keep server latency below 5 ms p99

📌 Interview Tips

1A 5 GB/s event pipeline uses Snappy at producers (hundreds of MB/s, 2x ratio), then recompresses to Zstd for storage (4x ratio), cutting 200 TB daily to 50 TB

2Query engine scans 10 GB/s physical data compressed 4x with Zstd, delivering 40 GB/s effective throughput if decompression uses under 40% CPU

3Facebook deploys Zstd with variable levels: higher for cold storage, lower for latency sensitive paths, gaining 10 to 15% size reduction or 3 to 5x speed improvement over zlib

← Back to Compression Algorithms Trade-offs Overview