Stream Processing ArchitecturesWatermarking & Late Data HandlingMedium⏱️ ~3 min

Late Data Policies and Production Scale Handling

The Late Data Decision: When an event arrives with a timestamp older than the current watermark, the streaming engine must make an immediate decision. This event is "late" because the system already finalized windows that should have included it. You have three fundamental options, each with different operational characteristics.
1
Drop and Count: Discard the event and increment a counter. Simplest approach, zero additional cost, but permanently loses data. Requires monitoring the drop rate to detect issues.
2
Side Output Stream: Route late events to a separate topic or table. Preserves data for offline reconciliation, audit trails, or ML feature extraction. Adds storage cost and downstream complexity.
3
Retract and Update: Reopen the historical window, update the aggregate, and emit a correction. Provides accuracy but requires consumers to handle updates and state stores to retain some historical data.
Scale Considerations: At 50,000 to 500,000 events per second, late data handling choices have concrete infrastructure impact. If your delay distribution shows 2% of events arrive beyond your 5 minute watermark, that is 1,000 to 10,000 late events per second. Dropping them costs nothing but means your hourly dashboard counts are systematically 2% low. Routing to a side topic means writing 1,000+ messages per second to additional storage, roughly 100 to 500 MB per hour depending on event size. This adds up across dozens of streaming jobs: one large organization reported 8 TB per day of late data side outputs across their platform.
Late Data Volume at Scale
2%
LATE RATE
10K/sec
LATE EVENTS
500MB/hr
SIDE OUTPUT
Advanced Pattern with Allowed Lateness: Some frameworks support "allowed lateness after watermark." Even after finalizing a window and emitting results, the system keeps limited state for an additional grace period. Late events within this grace period trigger updates. For example, finalize and emit a 1 minute window when the watermark passes, but keep its state for another 3 minutes. Events arriving in that 3 minute grace window produce updated results. This requires downstream systems to be idempotent and handle retractions or upserts, but it balances timeliness with accuracy. Google Cloud Dataflow uses this pattern with allowed lateness defaulting to 0 but configurable up to days. A team might emit speculative results immediately when the watermark passes for dashboard freshness, then issue corrections for the next hour as late mobile data trickles in, accepting that their serving layer must merge updates. Monitoring is Critical: You must track late data metrics per job and per partition. Key signals include percentage of events dropped as late per hour, watermark lag behind wall clock time (target under 10 seconds for real time systems), and state size trend. A sudden increase in late data percentage often indicates upstream client issues, network problems, or misconfigured watermarks rather than normal variance.
💡 Key Takeaways
Late data options are drop with counting, route to side output for offline handling, or retract and update historical results, each with different cost and complexity trade-offs
At 500,000 events per second with 2% late rate, side output routing generates 10,000 extra writes per second and up to 8 TB per day across a large platform
Allowed lateness after watermark lets systems emit fast results then issue corrections within a grace period, requiring idempotent downstream consumers but improving both latency and accuracy
Monitoring late data percentage, watermark lag, and state size is essential because sudden changes usually indicate upstream production issues rather than expected variance
📌 Examples
1A user facing analytics dashboard uses 1 minute watermark with drop policy, accepting 0.5% undercount for sub 10 second latency. A nightly batch job reconciles from the complete dataset for compliance reporting
2Netflix routes late events to S3 for their recommendation models, which retrain daily. Real time aggregations drop late data to maintain strict latency SLAs, while ML pipelines process complete historical data including late arrivals
3A payments system uses 10 minute allowed lateness with retraction support. Windows emit preliminary fraud scores immediately, then update for 30 minutes as mobile payment confirmations arrive delayed, with the fraud service handling score updates
← Back to Watermarking & Late Data Handling Overview
Late Data Policies and Production Scale Handling | Watermarking & Late Data Handling - System Overflow