Production Architecture and Scale

The Full Pipeline:
At companies operating at tens of billions of events per day, AQP is not a standalone feature. It is integrated into the entire analytics stack, from ingestion through query execution to user interface.

Ingestion and Maintenance:
Raw events land in a data lake or warehouse, often in Parquet or ORC columnar format. Nightly batch jobs create or refresh multiple sample tables at 0.1%, 1%, and 10% rates. These samples are partitioned and distributed identically to the base table to preserve join keys and locality. For example, if the main table is partitioned by date and sharded by user_id, the samples maintain the same scheme.

In parallel, streaming ingestion jobs maintain sketch structures in a real time Online Analytical Processing (OLAP) store. As each event arrives, the system updates approximate counters: HyperLogLog sketches for distinct users per feature flag, quantile sketches for latency percentiles per Application Programming Interface (API), Count Min Sketches for top queries. These sketches are stored as time series, typically per hour or per day, in columnar stores optimized for fast retrieval.

Query Routing and Optimization:
When an analyst issues a query through a Business Intelligence (BI) tool, a query router classifies it. The rules might be: if the query is exploratory (not feeding a regulated report), dominated by large scans with aggregate functions, and does not have highly selective filters, then rewrite it to use samples or sketches.

For example, "count distinct users by country over last 90 days" can be served from per day, per country HyperLogLog sketches. The engine reads 90 days × 200 countries = 18,000 tiny sketches (perhaps 36 MB total), merges them using HLL union operations, and returns results in 100 to 500ms at p95 latency, even at 10,000 queries per second (QPS) globally.

Query Throughput at Scale
10K QPS
GLOBAL
500ms
P95 LATENCY
36 MB
DATA READ
Online Aggregation:
Some advanced systems implement progressive refinement. The engine starts scanning and produces a running estimate with confidence intervals that shrink over time. The user interface might show an initial answer after 200ms with ±20% error, then refine it to ±5% after 1 second, and ±2% after 3 seconds. Users can stop early if they see the metric clearly exceeds or falls below a threshold. This is particularly valuable during incidents when engineers need quick directional answers.

Operational Reality:
At Meta scale with 50 to 100 billion events per day retained for 1 to 2 years, you have multi petabyte warehouses. Thousands of analysts and data scientists run exploratory queries per hour expecting p50 latency under 1 second and p95 under 3 seconds. Without AQP, you would need to massively overprovision clusters or accept degraded user experience.

✓ In Practice: Netflix uses Apache Druid with sketch based aggregations to power dashboards showing concurrent viewers and unique viewer counts per title. Uber maintains quantile sketches in Apache Pinot for real time API latency monitoring across thousands of services.

💡 Key Takeaways

✓Production AQP integrates across ingestion, storage, query routing, and user interface, not just query execution

✓Nightly batch jobs maintain 0.1%, 1%, and 10% samples while streaming jobs update sketch structures per hour or per day

✓Query routers classify queries as approximate safe based on rules: exploratory, aggregate heavy, not highly selective, not regulatory

✓Sketch queries read megabytes instead of terabytes: 90 days × 200 countries = 18,000 HLL sketches at 2 KB each = 36 MB for a distinct count query

✓Online aggregation provides progressive refinement: initial answer at 200ms with ±20% error, refined to ±2% after 3 seconds

✓At Meta scale of 50 to 100 billion events per day, AQP enables thousands of concurrent analysts with p95 latency under 3 seconds

📌 Interview Tips

1BigQuery approximate aggregation: APPROX_COUNT_DISTINCT uses HLL sketches internally, changing I/O from terabytes of raw data to megabytes of sketch metadata for high cardinality columns.

2Netflix Druid setup: ingests 1 million events per second, maintains theta sketches per title per hour, serves dashboard queries at 100 to 500ms p95 across multi datacenter deployment.

3Uber incident response: on call engineers query quantile sketches in Pinot showing API p99 latency over last 5 minutes, getting results in under 1 second to diagnose ongoing issues.

← Back to Approximate Query Processing Overview