Compression Techniques: How Time Series Databases Achieve 10 to 100x Storage Reduction
Why Time Series Compresses Well
Time series data compresses extraordinarily well because adjacent values are highly correlated. Timestamps increase monotonically. Temperatures change gradually. Status codes repeat frequently. Specialized encodings achieve 8-59x compression versus naive row storage and 10-100x versus raw JSON.
Timestamp Compression
Delta-of-delta encoding exploits monotonic timestamps. Instead of storing absolute values (1609459200, 1609459201, 1609459202), store the first value plus delta (1 second), then delta-of-deltas (0, 0, 0). For regular-interval data, this collapses to nearly zero bits after the first few bytes. Variable-length encoding allocates fewer bits when deltas are predictable.
Float Compression
XOR compression for floating-point values exploits IEEE 754 representation (the standard binary format for floats). Consecutive readings like 72.3, 72.4, 72.3 share many bits. XORing consecutive values produces mostly zeros, which compress efficiently using leading/trailing zero suppression. This achieves 10-20x reduction on typical sensor data.
String and Integer Compression
Dictionary encoding maps repeated strings (status codes, region names) to small integer IDs stored once per segment, then references IDs instead of full strings. Run-length encoding compresses sequences like OK, OK, OK, OK, ERROR, OK, OK into (OK, 4), (ERROR, 1), (OK, 2). Frame-of-reference for integers stores the minimum value once, then encodes offsets with minimum bits needed.
Modern TSDBs persist data in Parquet columnar format, applying these encodings automatically. Combined with SIMD (Single Instruction Multiple Data, CPU operations processing many values simultaneously), compressed data on cheap object storage often queries faster than uncompressed on SSD because less I/O is required.