Time Partitioning and Storage Tiers: How Systems Scale Billions of Points

The Partitioning Strategy:

Time series systems partition data into contiguous time chunks, typically 1 hour, 6 hours, or 1 day windows. Each partition becomes a natural unit for storage, compaction, and lifecycle management. New writes go to the current hot partition residing on fast storage with in memory indices. Once a partition's time window closes and most writes finish, it gets compacted and moved to cheaper storage. This pattern minimizes random writes and exploits sequential disk access, which is 50 to 100 times faster than random access on spinning disks and still 10 times faster on SSDs.

Within each partition, data is sorted first by metric identity (a hash of measurement name plus sorted tags), then by timestamp. This allows efficient scans of a single time series or a group of series sharing a metric and tag subset. You can read an entire day for one metric with a single sequential scan, rather than hopping around an index.


Hot Tier (1-3 days)

SSD, In-Memory Index
10-200ms queries
↓ Compaction

Warm Tier (30-90 days)

Compressed, 1min rollups
200ms-2s queries
↓ Downsampling

Cold Tier (1-3 years)

Object Storage, Daily aggs
1-5s queries
Multi Tier Architecture:

Uber M3 exemplifies this approach with a coordinator layer accepting writes and queries, a short term store keeping hours to days of high resolution data, and a long term store backed by distributed object storage for compressed downsampled data. Requests hit the short term tier first for speed, falling back to long term storage for older ranges. This architecture allows dashboard queries over the last hour to complete in 50 to 100 milliseconds, while a capacity planning query over 6 months of daily data takes 2 to 4 seconds, which is acceptable for that use case.
✓ In Practice: Netflix keeps 1 second resolution metrics for 7 days, 1 minute rollups for 90 days, and daily aggregates for multiple years. This tiering reduces storage from approximately 600 terabytes of raw data to under 50 terabytes across all tiers for their fleet metrics.
Compression and Encoding:

Numeric values benefit enormously from delta encoding, run length encoding, or Gorilla style compression, exploiting small differences between consecutive values and timestamps. A raw point might be 16 bytes (8 bytes timestamp, 8 bytes double precision value), but smooth series compress to 1 to 3 bytes per point on average. This yields 5 to 20 times size reduction. Keeping more data in memory directly impacts how wide a time window you can scan with low latency. If you have 64 gigabytes of memory and 10 times compression, you can hold roughly 200 billion raw equivalent points in memory for instant access.

The Trade Off:

Time partitioning and tiering optimize for recent data queries and write throughput but sacrifice flexibility for arbitrary time range joins. You cannot efficiently join a metric from 1 month ago with live data from today if they live in different storage systems. Cross partition queries require scatter gather patterns that add latency. The design assumes most valuable queries focus on recent contiguous windows, which holds true for observability dashboards and alerting but less so for some analytical workloads that need random access across long spans.

💡 Key Takeaways

✓Partitioning by time windows (1 hour to 1 day) enables append only writes and sequential scans, which are 10 to 100 times faster than random access depending on storage medium.

✓Hot tier on SSD with in memory indices serves recent 1 to 3 days at 10 to 200 millisecond query latencies, handling dashboard and alerting use cases.

✓Warm and cold tiers use compression and downsampling (1 minute to daily rollups), trading query speed for storage efficiency, reducing 600 terabytes to under 50 terabytes at Netflix scale.

✓Gorilla style compression achieves 5 to 20 times size reduction by delta encoding timestamps and values, fitting 200 billion raw equivalent points in 64 gigabytes of memory.

✓Multi tier design assumes most queries target recent contiguous windows; cross partition joins or arbitrary time range queries incur scatter gather latency penalties.

📌 Interview Tips

1Uber M3 stores last 48 hours on SSD with in memory index, 30 days in compressed warm tier on local disks, and years of daily aggregates in Amazon S3.

2A dashboard query for last 1 hour CPU by service scans 3600 seconds of data from hot tier in 80 milliseconds; a 6 month capacity trend query reads daily aggregates from cold tier in 3 seconds.

3With Gorilla compression, a series with values oscillating between 50.0 and 51.0 every second compresses from 16 bytes per point to under 2 bytes per point.

← Back to Time-Series Data Modeling Overview