Tumbling vs Hopping Windows in Stream Processing
Windows in Stream Processing
When processing continuous data streams (log analysis, real time analytics), you need to group events into windows for aggregation. Same concepts as rate limiting but different application: instead of counting requests to reject, you are counting events to compute metrics like "page views in the last 5 minutes" or "average order value per hour." The window type you choose affects result accuracy and resource usage.
Tumbling Windows (Non Overlapping)
Each event belongs to exactly one window. 5 minute tumbling windows: 00:00 to 05:00, 05:00 to 10:00, 10:00 to 15:00. No overlap means each event is processed once. Simple to implement and understand. Use when you want distinct time buckets (hourly reports, daily summaries). Downside: an event at 04:59 and one at 05:01 are in different windows even though they are 2 seconds apart.
Hopping Windows (Overlapping)
Windows overlap based on a hop interval. 5 minute windows hopping every 1 minute: 00:00 to 05:00, 01:00 to 06:00, 02:00 to 07:00. Each event belongs to window_size / hop_interval windows (here, 5 windows). Produces smoother output, catching trends that tumbling windows miss. Cost: 5x the computation (each event processed 5 times) and 5x the memory (5 windows active simultaneously).
Session Windows (Event Driven)
Windows defined by activity gaps rather than fixed time. A session window groups events until a gap of N minutes occurs. User browses a site: page views at 10:00, 10:02, 10:05, then 10:25. With 10 minute session gap, first three views are one session, 10:25 starts a new session. Perfect for user behavior analysis where fixed time boundaries do not align with actual activity patterns. More complex to implement as window boundaries are data dependent.
Choosing Window Type
Tumbling: fixed reports, billing periods, simple aggregations. Hopping: real time dashboards, anomaly detection, smoother trend lines. Session: user analytics, workflow tracking, activity based grouping. The choice depends on whether your analysis needs discrete buckets (tumbling), continuous monitoring (hopping), or behavior based grouping (session).