Storage and Schema Design Differences

The Fundamental Divergence: OLTP and OLAP optimize storage and schemas for completely opposite workload patterns, which is why a single system struggles to handle both efficiently.

OLTP Schema Design:

Tables are highly normalized around entities: users, orders, payments. Each table has clear primary keys and foreign keys maintaining referential integrity. An order table stores order_id, user_id, total_amount, and created_at. Product details live in a separate products table to avoid duplication.

Storage is row oriented: all columns for a single row are stored together on disk. This is optimal for "get order 12345" queries that need all fields for one specific row. Indexes support common access patterns like "find all orders for user 98765" but are added selectively because each index adds 10 to 30% storage overhead and every INSERT must update all indexes.

OLAP Schema Design:

Schemas are denormalized into star or snowflake patterns with fact tables and dimension tables. A fact table like orders_fact contains billions of rows with foreign keys to dimension tables like users_dim, products_dim, dates_dim. Denormalization duplicates data intentionally to speed up joins.

Storage is column oriented: all values for a single column across all rows are stored together. When you query "sum of revenue by country", the system reads only the revenue and country columns, ignoring the other 50 columns in the table. Combined with compression, this can reduce petabytes of raw logs to tens of terabytes in the warehouse.

Row Oriented (OLTP)
Store entire row together
Fast point lookups
Normalized schema
vs
Column Oriented (OLAP)
Store column values together
Fast aggregations
Denormalized schema
Partitioning Strategy:

OLAP tables partition by time: day or hour. This lets queries prune irrelevant partitions. A query for "last 7 days" scans only 7 partitions instead of the entire multi year table. Tables can also partition by geography or customer segment for further pruning.

"Indexes that help OLTP point queries actively hurt OLAP full table scans and complex joins. The storage layout must match the workload."

💡 Key Takeaways

✓OLTP uses row oriented storage and normalized schemas optimized for point lookups returning all fields for one specific row

✓OLAP uses column oriented storage and denormalized star schemas optimized for scanning specific columns across billions of rows

✓Columnar storage with compression reduces petabytes to tens of terabytes by reading only needed columns and compressing repetitive values

✓OLAP partitions tables by time (day or hour) to prune irrelevant data, scanning only 7 partitions for "last 7 days" instead of years

📌 Interview Tips

1OLTP: Normalized order table with foreign key to users table, B tree index on user_id, retrieves single order in 5ms

2OLAP: Denormalized orders_fact table partitioned by day, query sums revenue column for 50 billion rows across 90 day partition in 12 seconds

← Back to OLAP vs OLTP Trade-offs Overview