Out of Order Data and Late Arrivals: Handling Time Series Reality
Why Data Arrives Out of Order
Real-world time series rarely arrives in perfect chronological order. Network delays, distributed clock skew, batch processing, and offline mobile sync all cause data points to arrive seconds, minutes, or hours after their timestamp. Systems that reject out-of-order writes create production pain; systems that accept them pay with complexity.
The Storage Challenge
Time-partitioned storage assumes append-only writes within each segment. When a late data point arrives for an already-persisted segment, the system must either: (1) reject it, (2) buffer for batch re-ingestion, or (3) rewrite the segment incorporating the new point while maintaining sort order and deduplicating. Excessive late writes trigger repeated compaction passes merging overlapping segments, spiking CPU and I/O and degrading query latency.
Configurable Late Windows
Production systems use configurable late-arrival windows. Data within the tolerance (e.g., up to 1 hour late) is accepted with a deduplication index (tracking seen timestamps per series to handle duplicates). Data outside the window gets rejected or routed to a separate repair path. Relational TSDBs allow inserting into any partition with automatic reindexing, but performance degrades if late writes constantly split and merge chunks.
Client-Side Strategies
Mobile and IoT devices with intermittent connectivity buffer locally with original timestamps and upload when connected. The server uses unique event IDs (device_id + timestamp) for deduplication. Monitoring late arrival rate and out-of-order percentage helps detect upstream issues like clock drift, network problems, or misconfigured batch jobs before they cascade into TSDB instability.