Big Data Systems • Real-time Analytics (OLAP Engines)Hard⏱️ ~3 min
Data Modeling and Pre-Aggregation Strategies
Real-time OLAP demands aggressive denormalization and strategic pre-aggregation to hit sub-second latency SLAs. The foundational pattern is wide fact tables keyed by event time and an optional primary key for upserts, with all dimensions flattened inline. Instead of normalizing user attributes into a separate user dimension table, you store userId, userName, userCountry, userTier directly in each event row. This eliminates runtime joins that would require network shuffles across nodes. LinkedIn's "Who viewed your profile" stores viewer industry, company, and seniority directly in each profile view event, enabling single table scans.
Dimensions must be encoded as integers via dictionary compression to enable compact storage and fast filtering. A dimension like "country" with 200 values gets mapped to integers 0 to 199, then indexed with bitmaps where each bit represents one row. Filtering "country equals US" becomes a blazingly fast bitmap scan instead of string comparisons. Uber's marketplace metrics encode city, vehicle type, and payment method as integers, compressing storage by 5x to 10x while enabling sub-200 ms queries on tens of millions of rows.
Pre-aggregations are critical for queries spanning large time ranges or high cardinality group bys. Star tree and cube style structures precompute common aggregations at multiple granularities: 1 minute, 5 minute, 1 hour buckets for time, crossed with key dimensions like region and product. A query for "hourly revenue by region in last 30 days" hits the precomputed hourly cube (720 rows) instead of scanning billions of raw events. Airbnb precomputes experiment metrics at minute and hour granularities, keeping p95 latency under 1 to 2 seconds for 100s of concurrent analysts. The trade-off is inflexibility: adding a new dimension to the cube requires reprocessing.
For high cardinality aggregations like COUNT DISTINCT users or percentile calculations, exact computation is prohibitively expensive. Use HyperLogLog (HLL) sketches for approximate distincts (1 to 2% error, fixed memory) and quantile sketches for percentiles. These sketches are mergeable, so you can precompute them per segment and combine at query time. The alternative, exact distinct counts scanning billions of rows, would blow p99 latency from sub-second to tens of seconds and exhaust memory.
💡 Key Takeaways
•Denormalize all dimensions into wide fact tables to eliminate runtime joins; LinkedIn stores viewer industry, company, and seniority directly in each profile view event for single table scans
•Dictionary encode dimensions as integers (country maps to 0 to 199) and index with bitmaps, compressing storage by 5x to 10x and enabling sub-millisecond filter evaluation on millions of rows
•Precompute aggregations at multiple time granularities (1 minute, 5 minute, 1 hour) crossed with key dimensions; query "hourly revenue by region in last 30 days" scans 720 precomputed rows instead of billions of raw events
•Use HyperLogLog (HLL) sketches for approximate COUNT DISTINCT (1 to 2% error, fixed memory) and quantile sketches for percentiles; exact computation would blow latency from sub-second to tens of seconds
•Star tree structures precompute combinations of dimensions (time, region, product) but require reprocessing to add new dimensions; this trades flexibility for predictable low latency on known query patterns
•Semi-structured payloads (JSON) should be split: extract common dimensions for filtering and group by, store raw JSON as fallback for long tail exploratory queries that can tolerate higher latency
📌 Examples
Uber city metrics: Encodes cityId (1 to 500), vehicleType (1 to 10), paymentMethod (1 to 5) as integers with bitmap indexes; precomputes 5 minute aggregations of trip counts, revenue, and HLL sketches of distinct drivers per city and vehicle type, serving sub-200 ms queries on tens of millions of trips
Airbnb experiment dashboard: Precomputes booking rate, HLL distinct bookers, and revenue quantiles per experiment variant at 1 minute and 1 hour granularities; queries for "booking rate by variant in last 24 hours" hit 24 precomputed hourly rows instead of scanning millions of raw booking events, keeping p95 under 1 second
LinkedIn profile view insights: Stores viewer dimensions (industry, seniority, company size) as dictionary encoded integers in each view event; precomputes hourly cubes of view counts and HLL distinct viewers per dimension combination, enabling "Who viewed your profile" queries to return in sub-100 ms across 100+ use cases