TSDB Query Optimization: Rollups, Pushdowns, and Time Operators

Time Series Database (TSDB) query engines are specialized for time centric operations and must handle massive scan volumes efficiently. The key techniques are time based partition pruning, vectorized columnar execution, precomputed rollups, and specialized time operators that traditional SQL engines lack.

Partition pruning exploits time based data layout to eliminate entire segments before scanning. If data is partitioned by day and a query requests the last 2 hours, the engine skips hundreds of day partitions and scans only recent segments. Combined with tag indexes (inverted or bitmap structures), queries filter to relevant series before value scanning. Vectorized execution processes columnar data in batches (typically 1024 rows), exploiting CPU caches and Single Instruction Multiple Data (SIMD) instructions. This is why production systems achieve sub second response times for queries over millions of data points when properly indexed.

Rollups or continuous aggregates precompute coarser time resolutions to cap query costs on long ranges. Raw data arrives at 10 second granularity but background jobs continuously aggregate to 1 minute, 5 minute, and 1 hour resolutions with different retention policies (raw kept 7 days, 1 minute kept 30 days, 1 hour kept 1 year). A query for the last 90 days grouped by hour reads the 1 hour rollup (2160 points) instead of raw data (777,600 points at 10 second intervals), reducing scan volume by 360x and query latency from minutes to milliseconds.

Time operators are domain specific functions: bucketing (fixed interval aggregation), sliding windows (moving averages over N periods), ASOF (as of) joins to align series with different timestamps, rate and derivative calculations, and interpolation for missing points. A query computing requests per second requires rate(http_requests[1m]) which calculates the per second change over a 1 minute window, handling counter resets when services restart. These operators pushdown to storage to reduce data movement and enable vectorized computation close to the data.

💡 Key Takeaways

•Partition pruning eliminates entire time based segments before scanning: query for last 2 hours skips hundreds of day partitions reducing scan volume by orders of magnitude

•Vectorized columnar execution processes data in batches of 1024 rows exploiting CPU caches and SIMD instructions for sub second response over millions of points

•Rollups precompute coarser resolutions: raw 10 second data for 7 days, 1 minute for 30 days, 1 hour for 1 year enabling 360x scan reduction on long range queries

•Time operators include bucketing (fixed intervals), sliding windows (moving averages), ASOF joins (align mismatched timestamps), rate (per second change with counter reset handling), derivative, and interpolation

•Tag indexes use inverted or bitmap structures to filter to relevant series before value scanning reducing cardinality early in query plan

•Pushdown optimization: time filters and aggregations execute close to storage reducing data movement and enabling vectorized computation on compressed columnar formats

📌 Examples

90 day query optimization: raw 10 second data would scan 777,600 points taking minutes but 1 hour rollup scans only 2,160 points returning in milliseconds, 360x reduction

Rate calculation: rate(http_requests[1m]) computes per second change over 1 minute window handling counter resets when service restarts by detecting decreases and adjusting

ASOF join example: joining CPU metrics sampled every 10 seconds with memory metrics sampled every 15 seconds using timestamp tolerance to align nearest values without requiring exact matches

Netflix Atlas: millions of time series updates per second with dashboard queries returning in tens of milliseconds for recent data windows using in memory storage and fast aggregations on high cardinality tags

← Back to Time-Series Databases Overview