Compression Techniques: How Time Series Databases Achieve 10 to 100x Storage Reduction

Why Time Series Compresses Well
Time series data compresses extraordinarily well because adjacent values are highly correlated. Timestamps increase monotonically. Temperatures change gradually. Status codes repeat frequently. Specialized encodings achieve 8-59x compression versus naive row storage and 10-100x versus raw JSON.
Timestamp Compression
Delta-of-delta encoding exploits monotonic timestamps. Instead of storing absolute values (1609459200, 1609459201, 1609459202), store the first value plus delta (1 second), then delta-of-deltas (0, 0, 0). For regular-interval data, this collapses to nearly zero bits after the first few bytes. Variable-length encoding allocates fewer bits when deltas are predictable.
Float Compression
XOR compression for floating-point values exploits IEEE 754 representation (the standard binary format for floats). Consecutive readings like 72.3, 72.4, 72.3 share many bits. XORing consecutive values produces mostly zeros, which compress efficiently using leading/trailing zero suppression. This achieves 10-20x reduction on typical sensor data.
String and Integer Compression
Dictionary encoding maps repeated strings (status codes, region names) to small integer IDs stored once per segment, then references IDs instead of full strings. Run-length encoding compresses sequences like OK, OK, OK, OK, ERROR, OK, OK into (OK, 4), (ERROR, 1), (OK, 2). Frame-of-reference for integers stores the minimum value once, then encodes offsets with minimum bits needed.
Modern TSDBs persist data in Parquet columnar format, applying these encodings automatically. Combined with SIMD (Single Instruction Multiple Data, CPU operations processing many values simultaneously), compressed data on cheap object storage often queries faster than uncompressed on SSD because less I/O is required.

💡 Key Takeaways

✓Delta-of-delta encoding compresses regular timestamps to nearly zero bits by storing differences between consecutive deltas

✓XOR compression for floats exploits IEEE 754 bit representation where consecutive values share many bits, achieving 10-20x reduction

✓Dictionary encoding maps low-cardinality strings to small integers stored once; run-length encodes repeated consecutive values

✓Frame-of-reference for integers stores minimum once, then encodes offsets with minimum bits (values 1000-1100 need only 7 bits/offset)

✓Compression ratio varies with cardinality: 100 devices/1 metric achieves 59x; 4000 devices/10 metrics achieves 8x

✓Compressed columnar on cheap storage queries faster than uncompressed row on SSD due to 10-100x less I/O required

📌 Interview Tips

1Delta-of-delta example: timestamps 1000, 1001, 1002, 1003 → first=1000, delta=1, delta-of-deltas=0,0,0 → 4 bytes instead of 16.

2XOR compression: temperatures 72.3, 72.4, 72.3 XOR to mostly zeros (shared mantissa bits). 1000 readings at 1.37 bytes avg vs 8 bytes = 6x.

3Calculate savings: 20MB raw metric data → 500KB compressed (40x) using combined delta/dictionary/run-length. Queries read 500KB not 20MB.

← Back to Time-Series Databases (InfluxDB, TimescaleDB) Overview