Real-time Analytics & OLAP • Pre-aggregation & Rollup PatternsMedium⏱️ ~3 min
Understanding Rollup Patterns
What Rollup Means:
Rollup is the pattern of organizing pre-aggregated data along hierarchies and time granularities. Instead of maintaining metrics at just one level, you store summaries at multiple levels simultaneously. For example, you keep metrics computed at the hour level, day level, month level, and year level. Or at city, state, and country levels. When a query arrives, you automatically select the coarsest granularity that satisfies the request.
Why This Matters:
Imagine Netflix needs to analyze viewing metrics. Operational alerts need minute level data to detect service issues within seconds. Daily analysis by the growth team needs hour level aggregates. Quarterly business reviews need day level summaries. Without rollups, every query would recompute from the finest granularity available, which is wasteful.
Instead, Netflix maintains separate aggregation layers: 1 minute aggregates kept for 2 days in a hot store, 1 hour aggregates kept for 90 days, and 1 day aggregates kept for several years. When someone queries "show me viewing hours for the last quarter," the system routes to the day level rollup, scanning perhaps 90 rows instead of 130,000 minute level rows. This cuts query time from several seconds to under 100 milliseconds.
The Routing Logic:
Query routers or semantic layers automatically pick the appropriate rollup level. If you ask for data spanning 3 months with daily resolution, the system uses the day level rollup. If you ask for the last 6 hours with minute resolution, it uses the minute level rollup. This minimizes data scanned while still providing the requested granularity.
1 Minute
Hot: 2 days
↓
1 Hour
Warm: 90 days
↓
1 Day
Cold: Years
✓ In Practice: At scale, choosing the right rollup level can reduce data scanned by 1000x. Querying 90 day level rows instead of 130,000 minute level rows turns a 5 second query into a 100 millisecond query.
Storage Implications:
Maintaining multiple rollup levels increases storage overhead. Minute, hour, and day aggregates together might consume 20 to 40 percent more space than just keeping the finest granularity. But this is usually worthwhile because it eliminates the need to repeatedly compute coarser summaries from finer data at query time.💡 Key Takeaways
✓Rollup maintains pre-aggregated data at multiple granularities simultaneously, such as minute, hour, day, month levels
✓Query routers automatically select the coarsest rollup level that satisfies the requested time range and resolution
✓Reduces data scanned by 100x to 1000x for queries spanning long time ranges, cutting latency from seconds to milliseconds
✓Storage overhead is typically 20 to 40 percent for multiple rollup levels, but saves massive compute cost at query time
📌 Examples
1Netflix maintains 1 minute aggregates for 2 days (operational monitoring), 1 hour aggregates for 90 days (analysis), and 1 day aggregates for years (reporting). A quarterly report query scans 90 rows instead of 130,000.
2A finance system rolls up transaction metrics from hourly to daily to monthly. End of month reports query the monthly rollup directly, avoiding aggregation of 720 hourly buckets.