Event Time Windows
Windowed aggregations are the foundation of temporal features. Tumbling windows divide time into fixed non overlapping intervals (hourly buckets). Sliding windows overlap by a slide interval (30 minute window sliding every 5 minutes for fresher updates). Session windows group events by activity gaps (user session ending after 30 minutes of inactivity). The window type determines freshness: tumbling windows update once per interval, sliding windows update more frequently.
Late Event Handling
Real world events arrive out of order due to network delays, mobile offline sync, and distributed system clocks. Watermarks define how long to wait for late events before closing windows. A watermark of 10 minutes means windows wait 10 minutes past event time before finalizing. Events arriving after watermark are either dropped (losing accuracy) or trigger recomputation (adding complexity). Flink supports allowed lateness to continue updating closed windows for a grace period.
State Management
Streaming aggregations maintain state per entity per window. For 10 million users with 5 minute sliding windows and 1 hour lookback, you maintain 12 window states per user equals 120 million state entries. State size grows with entity count, window granularity, and feature complexity. Flink checkpoints state to durable storage every few seconds for fault tolerance, with checkpoint intervals balancing recovery speed against checkpoint overhead.
Backpressure Handling
When upstream event rate exceeds processing capacity, pipelines must slow down or drop events gracefully. Flink uses backpressure propagation to slow Kafka consumers. Without proper handling, buffer overflow causes out of memory errors or uncontrolled event loss.
✓Training serving skew causes 10 to 30% accuracy drops when offline batch features for training differ from online streaming features at inference due to separate code paths or time semantics
✓Feature stores consolidate logic into shared declarative specs compiled to both Spark batch (for training) and Flink streaming (for serving), ensuring identical transformation logic
✓Replay validation replays historical events through streaming pipeline and compares against batch computed features over same time range, catching skew from watermark bugs or null handling differences
✓Continuous monitoring compares streaming aggregates against batch recomputes with alerts on 1% threshold exceeded, detecting drift from code changes or configuration mismatches
✓Reference data versioning uses point in time lookups at event time to match training's historical join semantics, preventing skew from profile or catalog updates between training and serving
✓Event time vs processing time inconsistencies are a top skew source: training on processing time windows but serving with event time windows shifts feature boundaries by minutes to hours
1Airbnb search ranking: Unified feature definitions in Python deployed to both Spark for training dataset generation and Flink for real time serving. Replay tests run nightly, comparing 24 hour batches. Caught skew where offline used UTC timestamps but online used local time zones, causing 15% accuracy drop.
2Uber trip ETA prediction: Per-driver features like completed trips in last 7 days computed identically offline and online. Point in time joins for driver profile (vehicle type, rating) at trip event time. Continuous validation monitors hourly aggregates with 0.5% tolerance, alerting on schema evolution bugs.
3Netflix recommendation model: Feature store compiles feature definitions to Spark SQL for backfills and Flink SQL for streaming. Automated integration tests replay 1 week of production traffic monthly, comparing outputs. Prevented serving time skew when streaming pipeline had different null handling for missing user profiles.