Production Architecture: How Companies Separate OLTP from OLAP

The standard production pattern is a layered architecture where OLTP systems own the source of truth and emit change streams, while an ingestion layer lands those changes into analytical storage optimized for scans. Amazon retail provides a canonical example: carts, session state, and high cardinality key value operations run on a low latency, horizontally scalable store targeting single digit millisecond reads and writes at p99. Orders and payments use relational OLTP with strict consistency and multi Availability Zone (AZ) durability, aiming for p99 latencies under 50 to 100 ms per transaction. These systems never directly serve analytical queries.

Instead, analytics is decoupled via Change Data Capture (CDC) from write ahead logs and event streams into a data lake and warehouse stack. Large fact tables for orders, clicks, and shipments reach tens of billions to trillions of rows. Interactive dashboards target 1 to 10 second latencies on aggregated slices, while deep dive queries can run for minutes. The critical metric is data freshness: Uber's near real time analytics pipeline moves data from OLTP to OLAP in seconds to low tens of seconds for supply demand heatmaps, while heavy model training operates on daily or hourly batch snapshots with acceptable staleness.

Google's architecture demonstrates global scale separation. Ads serving and billing require external consistency across regions using a globally distributed OLTP system where regional reads commit in single digit milliseconds and cross region writes add tens of milliseconds for consensus. This enables transactional invariants like budget enforcement under heavy concurrency. Analytical workloads run on a separate columnar engine with separation of storage and compute, routinely scanning terabyte scale partitions and returning interactive aggregates in seconds, with batch jobs processing petabytes using elastic parallelism.

The tradeoff is freshness versus isolation. Directly querying OLTP for analytics gives you up to the second data but risks catastrophic production impact. Exporting via CDC introduces lag, typically ranging from near real time (seconds) for streaming pipelines to batch (minutes to hours) for simpler systems, but completely isolates workloads. Companies set freshness Service Level Indicators (SLIs) based on business needs: operational dashboards might require sub minute freshness and alert when lag exceeds 5 minutes, while monthly financial reports tolerate daily batch updates.

💡 Key Takeaways

✓Standard pattern: OLTP owns source of truth, emits change streams via CDC from write ahead logs, and ingestion layer lands into analytical storage with freshness ranging from seconds (streaming) to hours (batch)

✓Amazon retail separates cart/order OLTP (single digit ms) from analytics (scan billions of rows); interactive dashboards target 1 to 10 second response, deep dives run minutes

✓Freshness is the key tradeoff: direct OLTP queries give zero lag but risk production outages, CDC introduces seconds to hours of staleness but provides complete workload isolation

✓Google uses globally distributed OLTP for ads (single digit ms regional, tens of ms cross region consensus) while analytical queries run on separate columnar engine scanning terabytes in seconds with elastic compute

✓Operational analytics (Uber supply demand heatmaps) require sub minute freshness and alert when lag exceeds thresholds, while batch reporting (monthly financial closes) tolerates daily updates

📌 Interview Tips

1Uber marketplace: OLTP handles trip state and pricing at p99 under 50 ms; streaming CDC moves data to OLAP in 5 to 30 seconds for real time heatmaps, while daily batch snapshots feed model training

2Amazon data lake: CDC streams from OLTP databases land into S3 backed data lake with tens of billions to trillions of rows in fact tables; interactive BI queries return in 1 to 10 seconds, ad hoc analysis runs minutes

3Meta: OLAP queries scan tens to hundreds of terabytes from petabyte scale data lake; thousands of queries per day with typical interactive latency of few to tens of seconds on partition pruned datasets

← Back to OLTP vs OLAP Overview