Stream Processing ArchitecturesStateful Stream ProcessingMedium⏱️ ~3 min

Windows, Event Time, and Watermarks

The Infinite Stream Problem: Streams are unbounded. They never end. But most computations need finite scopes: "transactions in the last 5 minutes" or "clicks in this user session." You cannot wait for the stream to finish because it never will. Windows solve this by dividing the infinite stream into finite, processable chunks. Window Types: Tumbling windows split time into fixed, non overlapping intervals. A 5 minute tumbling window starting at 10:00:00 captures events from 10:00:00 to 10:04:59, then a new window starts at 10:05:00. Each event belongs to exactly one window. These work well for regular metrics like "requests per minute." Sliding windows overlap. A 10 minute window sliding every 1 minute means at 10:05:00, you have a window covering 9:55:00 to 10:04:59. At 10:06:00, a new window covers 9:56:00 to 10:05:59. Events appear in multiple windows. This is useful for moving averages or detecting patterns over rolling time periods. Session windows are gap based rather than time based. A session window closes after 30 minutes of inactivity. If a user clicks at 10:00, 10:02, and 10:35, that creates two sessions: one from 10:00 to 10:02 (closed after 30 minute gap), and a new one starting at 10:35. E-commerce sites use session windows to track shopping behavior. Event Time vs Processing Time: Processing time is when your system sees the event. Event time is when the event actually happened, stamped by the producer. A mobile app might generate a click at 10:00:00 but due to network lag, your stream processor receives it at 10:00:15. Which timestamp matters? Event time is usually correct for business logic. You want to count "clicks that happened between 10:00 and 10:05" not "clicks we processed between 10:00 and 10:05." But event time creates a challenge: events arrive out of order. An event timestamped 10:00:10 might arrive after an event timestamped 10:00:20.
Typical Out of Order Arrival
EVENT TIME
10:00:10
ARRIVES
10:00:25
Watermarks: Watermarks are the system's way of saying "we have probably seen all events up to time T." When the watermark reaches 10:05:00, the system assumes no more events with timestamp before 10:05:00 will arrive, so it closes the 10:00 to 10:05 window and emits results. Watermarks are estimates. You can configure lateness tolerance. Allowing 10 minutes of lateness means keeping windows open 10 minutes past their end time, which increases state size by 2 to 3 times but catches more late events. Aggressive watermarks (1 minute tolerance) close windows faster but drop late data. Conservative watermarks (10 minute tolerance) keep more state and delay results.
❗ Remember: If watermarks are too aggressive, late events get dropped or go to a separate late data output. If too conservative, memory grows and latency increases because windows stay open longer. You must tune this based on observed lateness in your data.
💡 Key Takeaways
Windows divide infinite streams into finite chunks: tumbling (fixed, non overlapping), sliding (overlapping), or session (gap based)
Event time (when event happened) differs from processing time (when system sees it), and events often arrive out of order due to network delays
Watermarks estimate when all events up to time T have arrived, allowing the system to close windows and emit results while bounding state retention
Lateness tolerance is a trade off: 10 minute tolerance increases state size by 2 to 3 times but catches late events; 1 minute tolerance drops late data but uses less memory
Session windows close after inactivity gaps (for example, 30 minutes with no clicks), making them ideal for user behavior tracking where natural breaks define boundaries
📌 Examples
1Tumbling window: compute requests per minute by creating non overlapping 60 second windows
2Sliding window: detect spikes by comparing current 10 minute average against previous 10 minute average, windows slide every 1 minute
3Session window: track shopping cart activity where session ends after 30 minutes of no clicks, allowing per session metrics like items viewed or time spent
← Back to Stateful Stream Processing Overview