Hot Warm Cold Tiering: Balancing Query Speed and Storage Cost
Value Decay Over Time
Time series data value decays with age. Metrics from the last hour power real-time dashboards and alerts requiring millisecond latency. Data from last month serves historical analysis tolerating seconds of latency. Data older than a year rarely gets queried but must remain for compliance. Hot-warm-cold tiering exploits this pattern to optimize both performance and cost.
Tier Characteristics
Hot tier: Recent data (hours to days) in memory or fast SSD with row-oriented formats. Sub-10ms queries, rapid updates. Cost: ~/GB/month for memory.
Warm tier: Weeks to months in columnar format on local SSD. Query latency 10-100ms. Balances speed and cost.
Cold tier: Older data in object storage (distributed storage accessed via HTTP, like S3-compatible systems) in compressed Parquet files (columnar format optimized for analytics). Queries take seconds but storage costs drop to /bin/zsh.01-0.02/GB/month.
Lifecycle Policies
Lifecycle policies automate transitions. Compression policies convert hot row-oriented data to columnar format after configured age, achieving 10x compression while queries transparently span both formats. Retention policies delete data exceeding maximum age or migrate to cheaper tiers.
Query Federation
Queries span tiers transparently. A 30-day query hits hot memory for the last day, warm SSD for the last week, and cold object storage for the remainder. Query planners push down filters and partial aggregations to minimize data movement. Continuous aggregates (pre-computed rollups) optimize further: queries for hourly averages over 6 months read compact aggregated data rather than billions of raw points.