Out of Order Data and Late Arrivals: Handling Time Series Reality

Why Data Arrives Out of Order
Real-world time series rarely arrives in perfect chronological order. Network delays, distributed clock skew, batch processing, and offline mobile sync all cause data points to arrive seconds, minutes, or hours after their timestamp. Systems that reject out-of-order writes create production pain; systems that accept them pay with complexity.
The Storage Challenge
Time-partitioned storage assumes append-only writes within each segment. When a late data point arrives for an already-persisted segment, the system must either: (1) reject it, (2) buffer for batch re-ingestion, or (3) rewrite the segment incorporating the new point while maintaining sort order and deduplicating. Excessive late writes trigger repeated compaction passes merging overlapping segments, spiking CPU and I/O and degrading query latency.
Configurable Late Windows
Production systems use configurable late-arrival windows. Data within the tolerance (e.g., up to 1 hour late) is accepted with a deduplication index (tracking seen timestamps per series to handle duplicates). Data outside the window gets rejected or routed to a separate repair path. Relational TSDBs allow inserting into any partition with automatic reindexing, but performance degrades if late writes constantly split and merge chunks.
Client-Side Strategies
Mobile and IoT devices with intermittent connectivity buffer locally with original timestamps and upload when connected. The server uses unique event IDs (device_id + timestamp) for deduplication. Monitoring late arrival rate and out-of-order percentage helps detect upstream issues like clock drift, network problems, or misconfigured batch jobs before they cascade into TSDB instability.
Key Trade-off: Wider windows reduce data loss but increase compaction overhead. Tighter windows simplify storage but reject valid data requiring sophisticated client retry logic.

💡 Key Takeaways

✓Out-of-order writes force rewriting persisted segments to insert late points in sort order, triggering expensive compaction passes

✓Configurable late windows accept data within tolerance (e.g., 1 hour) using deduplication index; reject or reroute data outside window

✓Relational TSDBs allow inserting into any partition with auto-reindexing but degrade when late writes constantly split chunks

✓Deduplication uses unique event IDs (device_id + timestamp) to handle same data point arriving multiple times

✓Monitor late arrival rate and out-of-order percentage to detect upstream issues (clock drift, network delays, batch misconfiguration)

✓Wider windows: less data loss, more compaction overhead. Tighter windows: simpler storage, need sophisticated client buffering

📌 Interview Tips

1Mobile offline scenario: app buffers 1000 sensor readings locally, uploads 6 hours later. Server accepts within 24-hour window, deduplicates by device_id + timestamp.

2Debug compaction spike: CPU at 80% from old-timestamp batch job. Solution: separate write paths for real-time (1-hour tolerance) and backfill (direct to cold storage).

3Calculate late tolerance: 99% of data arrives within 5 minutes. Set window to 1 hour to capture 99.99% while limiting compaction overhead from extreme outliers.

← Back to Time-Series Databases (InfluxDB, TimescaleDB) Overview