Definition
Windowing is a technique that creates temporal boundaries on an infinite stream of events so you can compute meaningful aggregations like "transactions per user in the last 5 minutes" or "average latency per service in the past hour."
The Core Problem: In batch processing, boundaries are natural. You process yesterday's file or this hour's data dump. But in streaming, events keep arriving forever. How do you compute something like "number of logins per minute" when there is no start or end?
Without windowing, you face two bad choices. Either you compute running totals that grow infinitely (memory explodes), or you arbitrarily cut aggregates at random points (results are meaningless). Neither works.
How Windowing Solves This: Windowing divides the infinite stream into finite chunks based on time or event count. The most common types are:
First,
tumbling windows are fixed, non overlapping intervals. A 5 minute tumbling window creates buckets like 12:00 to 12:05, 12:05 to 12:10, never overlapping. Each event belongs to exactly one window. This is simple and memory efficient.
Second,
sliding windows overlap. A 5 minute window that slides every 1 minute means events from 12:00 to 12:05, then 12:01 to 12:06, then 12:02 to 12:07. Each event belongs to multiple windows. This captures trends more smoothly but uses more compute and storage.
Third,
session windows are dynamic and keyed per user or entity. A session starts with the first event and closes after a period of inactivity, say 10 minutes. If a user generates events at 12:00, 12:05, and 12:08, the session stays open until 12:18 (10 minutes after the last event). This is perfect for user behavior analysis.
Real World Context: At a payment processor handling 100,000 transactions per second, you might use a sliding 5 minute window to compute "transactions per card" for fraud detection. At Netflix, engineers use windowing to calculate error rates per region in 1 to 5 minute windows, triggering alerts when quality degrades.
✓Windowing creates finite temporal boundaries on infinite event streams, enabling aggregations like counts, sums, and averages over defined periods
✓Tumbling windows are non overlapping fixed intervals where each event belongs to exactly one window, offering simplicity and low memory overhead
✓Sliding windows overlap by having a step size smaller than window size, allowing each event to belong to multiple windows for smoother trend detection
✓Session windows are per key dynamic windows that close after inactivity, perfect for analyzing user behavior patterns and engagement sessions
1A fraud detection system uses 5 minute sliding windows (sliding every 1 minute) to track transaction count per card, catching spikes that indicate possible fraud
2Netflix computes error rates per region using 1 to 5 minute tumbling windows, triggering alerts when quality metrics degrade beyond thresholds
3An e-commerce site uses 30 minute session windows per user to analyze shopping patterns, where a session closes after 30 minutes of inactivity