Time Series Forecasting • Real-time Updates (Online Learning, Sliding Windows)Medium⏱️ ~3 min
Time Bucketing: Efficient Sliding Window Implementation
The naive approach to sliding windows stores every individual event within the window period, which becomes prohibitively expensive at scale. A 5 minute window receiving 1000 events per second would store 300,000 events per key, consuming gigabytes of memory for high cardinality dimensions. Time bucketing solves this by dividing the window into fixed size buckets and storing only pre aggregated values per bucket.
Here's how it works. Instead of storing every event in a 5 minute window, divide it into 60 buckets of 5 seconds each. When an event arrives, determine which bucket it belongs to based on its timestamp, then update that bucket's aggregate (count, sum, or sketch). To compute the window aggregate, sum across all active buckets. When a bucket ages out of the window, drop it and create a new one for incoming events. This reduces memory from O(events in window) to O(number of buckets), typically a 100x to 1000x reduction.
The tradeoff is granularity. With 5 second buckets, you can only slide the window in 5 second increments, not per event. For many use cases like rate limiting or trend detection, this approximation is acceptable. The bucket size becomes a tuning parameter: smaller buckets give finer granularity but use more memory, while larger buckets are more efficient but less precise. A common pattern is 1 second buckets for windows under 5 minutes, and 5 to 10 second buckets for windows up to 1 hour.
Production implementations add optimizations. Use circular buffers to avoid shifting bucket arrays. Store bucket timestamps to handle irregular arrivals and clock skew. For algebraic aggregates like sum and count, each bucket needs just a few bytes. For percentiles or distinct counts, use mergeable sketches like t-digest or HyperLogLog, keeping bucket size under 100 bytes even for complex statistics.
💡 Key Takeaways
•Time bucketing reduces memory from O(window size in events) to O(number of buckets), typically 60 to 120 buckets for windows from 5 minutes to 1 hour, achieving 100x to 1000x memory savings
•Each bucket stores pre aggregated values like count (8 bytes), sum (8 bytes), and metadata (8 bytes), totaling roughly 24 bytes per bucket, so a 60 bucket window uses 1.4 KB per key
•Bucket granularity creates a tradeoff: 1 second buckets for sub minute precision versus 5 to 10 second buckets for hour long windows with acceptable approximation error
•Algebraic aggregates (sum, count, min, max) merge trivially across buckets, while percentiles require mergeable sketches like t-digest (200 bytes) and distinct counts use HyperLogLog (1 KB at 1% error)
•For 10 million active keys with 60 buckets each, total state is approximately 14 GB, fitting in memory on a single large machine or sharded across 10 to 20 nodes at 1 GB per node
📌 Examples
Uber driver availability: 5 minute sliding window with 5 second buckets (60 buckets). Each bucket stores (online_count, total_requests). Memory per city: 1000 active drivers × 1.4 KB = 1.4 MB.
Airbnb search popularity: 1 hour hopping window with 1 minute hops for listing views. 60 buckets per listing, each storing (view_count, unique_users_sketch). At 2M listings, state is 60 buckets × 1KB × 2M = 120 GB, sharded across 40 nodes.
Google Ads click through rate: 15 minute sliding window with 15 second buckets (60 buckets per ad). Buckets store (clicks, impressions). Per ad overhead is 60 × 16 bytes = 960 bytes. At 10M ads, total is 9.6 GB.