Real-time Analytics & OLAPApproximate Query ProcessingHard⏱️ ~3 min

Failure Modes and Edge Cases

When Approximation Breaks Down: AQP has specific failure modes that can produce systematically wrong results, not just noisy estimates. Understanding these is critical for system design interviews and production deployments. Rare Events and High Selectivity: The most common failure is querying for rare events on sampled data. If you maintain a 1% sample of 1 billion rows, an event that occurs 10,000 times in the full data appears only about 100 times in the sample. Your confidence intervals explode. Worse, something occurring 500 times globally might be completely absent from your sample, giving you zero instead of 500. This is not just statistical noise. It is systematic undercounting. Imagine debugging a production issue where a specific error code appeared 200 times in the last hour across billions of requests. Your 0.1% sample likely has zero instances. The approximate system tells you the problem does not exist when it actually does.
❗ Remember: With 1% sampling, events occurring fewer than 10,000 times in a billion row table will have confidence intervals exceeding ±30%, making estimates unreliable for decision making.
Skew and Distribution Bias: Uniform sampling assumes uniform distribution, but real world data is heavy tailed. A handful of very large customers might generate 50% of traffic. If your sampling is purely random, you might undersample these critical accounts in one draw and oversample in another, leading to high variance in metrics like "average revenue per customer." Stratified sampling helps by oversampling important segments, but it requires knowing the stratification keys in advance. If your query filters on a dimension you did not stratify by, the bias remains. For example, you stratified by country but the query filters by device type. Your sample distribution across device types does not match the population. Sketch Merging and Multi Stage Aggregation: While sketches are mathematically mergeable, errors can compound in complex queries. If you merge sketches across 1,000 partitions, each with its own ±1% error, the combined error is not simply ±1%. It depends on overlap patterns and can grow to ±3% or more in pathological cases. Multi stage aggregations amplify this. Imagine computing distinct users per country, then rolling up to distinct users per continent. The continent level sketch merges country level sketches, each already approximate. If a user appears in multiple countries (for example, travelers), the sketch might double count or undercount depending on merge order and hash collisions.
1
Hour 1 sketch: 1M users, ±1% error
2
Merge 24 hours: Overlapping users cause ±2.5% error
3
Roll up regions: Error grows to ±4% at global level
Staleness and Drift: If samples or sketches refresh only nightly, but data distribution shifts during the day, your error bounds become invalid. Imagine a flash sale that triples traffic from a specific country. Your sketch from yesterday dramatically underestimates today's distinct users from that country because the sketch size was tuned for normal traffic. In streaming systems, delayed or out of order events create temporal inconsistencies. If an event arrives 2 hours late but your sketch already rolled over to a new time window, that event is either lost or counted in the wrong bucket. At high throughput (1 million events per second), even 0.1% late arrivals mean 1,000 miscounted events per second. Human and Operational Failures: The hardest failures are not mathematical but human. If users do not understand that "10.2M ± 1.5%" is a probabilistic statement, they might compare it directly against an exact metric from another system and report discrepancies as bugs. If your User Interface (UI) does not clearly label results as approximate, people make decisions assuming perfect accuracy. Monitoring is critical. Track what fraction of queries use AQP, actual observed error rates against periodic ground truth, and divergence over time. Implement circuit breakers: if error estimates exceed a threshold (for example, ±10%), automatically fall back to exact queries even if slower. Without this, you risk making business decisions on systematically biased data.
💡 Key Takeaways
Rare events with fewer than 10,000 occurrences in a billion row table have ±30% confidence intervals in 1% samples, making them unreliable
Heavy tailed distributions cause bias: if top 10% of customers generate 50% of traffic, uniform sampling gives misleading per customer averages
Sketch merging across 1,000 partitions can compound ±1% individual errors to ±3% or more in the final result
Staleness matters: daily refreshed sketches become invalid during flash sales or sudden traffic shifts that change distribution
Late arriving events in streaming systems (even 0.1% at 1M events per second means 1,000 miscounts per second) violate error bounds
Users comparing approximate results against exact metrics from other systems without understanding confidence intervals report false bugs
📌 Examples
1Failure case: debugging an API error that occurred 200 times in 10 billion requests. A 0.1% sample has zero instances, telling engineers the error does not exist when it actually needs investigation.
2Skew example: top 5 enterprise customers generate 40% of revenue. Uniform 1% sample sometimes includes 2 of them, sometimes 4, causing "average customer revenue" to swing wildly between queries.
3Compounding error: distinct users per city (±1% error per city) rolled up to state level (±1.5%), then national (±2.5%), then merged across 3 data centers (±4% final error due to overlaps).
4Circuit breaker scenario: monitoring detects that 15% of queries in the last hour have error bounds exceeding ±5%. System automatically falls back to exact computation for next 10 minutes while engineers investigate data skew.
← Back to Approximate Query Processing Overview