The Complete Real-Time OLAP Pipeline

The Full Journey:
Understanding real-time OLAP means tracing an event from its creation to appearing in a dashboard query result. At a large consumer company, this pipeline handles 5 to 20 million events per second globally, with each event being a few hundred bytes, reaching tens of terabytes daily.
1
Event Production: User actions, service logs, and business events are generated by microservices and clients. A page view, a purchase, a search query all become structured events with timestamps and metadata.
2
Durable Log: Events are first written to an append only log system, providing durability, backpressure handling, and a clear source of truth. Producers write once, multiple consumers read at their own pace. Retention is typically days to weeks.
3
Real-Time Ingestion: Ingestion tasks consume from the log, parse events, apply transformations, handle late or out of order events, and buffer data into in-memory real-time segments. These segments are periodically sealed and persisted to disk or object storage, often within minutes.
4
Batch Pipeline: In parallel, a separate pipeline writes the same events into a data lake. From there, larger, optimized offline segments are built hourly or daily. These segments are heavily compressed and pre-indexed for efficient historical querying at petabyte scale.
5
Query Serving: A query broker receives client requests, plans which segments and servers to hit, and routes sub-queries to many server nodes. Each node scans its subset of segments (both real-time and historical), performs local aggregations, and returns partial results that the broker merges.
6
Consumers: Dashboards for near real-time Key Performance Indicators (KPIs), alerting systems that trigger on aggregates (error rate exceeds 2% in last 5 minutes), and product features like top trending items in the last 15 minutes.
The Freshness Service Level Objective (SLO):
A typical target is 95% of events visible in queries within 10 seconds, 99% within 1 minute. This means from the moment a user clicks a button to when that click appears in aggregated metrics, the latency is measured in seconds, not hours.
Production Scale at Large Companies
200-400ms
P95 LATENCY
1-2s
P99 LATENCY
100s-1000s
QPS THROUGHPUT
Why the Dual Pipeline:
The architecture maintains both real-time and batch paths because they optimize for different things. Real-time segments prioritize freshness and are built quickly with lighter compression. Historical segments prioritize query efficiency and storage cost, using heavy compression and sophisticated indexes. Queries transparently span both: recent data from real-time segments, older data from historical segments, merged seamlessly by the broker.
✓ In Practice: This architecture sits between low level metrics systems (like Prometheus) and slower data warehouse reporting (like nightly batch jobs), feeding both operational decisions and product experiences.

💡 Key Takeaways

✓Pipeline handles 5 to 20 million events per second at large companies, reaching tens of terabytes daily

✓Real-time segments built within minutes prioritize freshness with lighter compression, historical segments built hourly or daily optimize for query efficiency

✓Freshness SLO typically targets 95% of events visible within 10 seconds, 99% within 1 minute

✓Query broker transparently merges results from both real-time segments (last few hours) and historical segments (older data) to serve unified queries

✓Production SLOs at scale: p95 latency 200 to 400 milliseconds, p99 under 1 to 2 seconds, handling hundreds to thousands of queries per second

📌 Interview Tips

1User clicks "buy now" button at 2:00:05 PM. Event reaches durable log at 2:00:05 PM, ingestion task processes it by 2:00:12 PM, and it appears in dashboard query results by 2:00:15 PM, within the 10 second SLO

2Dashboard query for "revenue by region in last 24 hours" hits broker, which routes sub-queries to 50 server nodes. 10 nodes scan real-time segments for last 3 hours, 40 nodes scan historical segments for previous 21 hours. Broker merges all partial results and returns in 280 milliseconds

3Alerting system runs query every minute: "error rate in last 5 minutes by service." Query scans only real-time segments (last 5 minutes of data), aggregates across 20 nodes, triggers alert if any service exceeds 2% error rate

← Back to Real-time OLAP Architecture Overview