Data Modeling & Schema Design • Dimensional Modeling (Star & Snowflake)Easy⏱️ ~2 min
What is Dimensional Modeling?
The Core Idea:
Dimensional modeling separates measurements from context. Think of it as organizing data the way business people naturally ask questions. Instead of complex normalized tables spread across dozens of joins, you have facts (the measurements) surrounded by dimensions (the context).
A fact table contains one row per business event at a chosen grain. For a streaming service, that might be one row per video play. Each row stores numeric measures like watch time in seconds, buffering events, and completion percentage, plus foreign keys pointing to dimension tables.
Dimension tables capture the who, what, where, when, and how. A Time dimension holds date attributes. A Content dimension describes the video title, genre, and rating. A User dimension tracks subscriber tier and country. When an analyst asks "What is average watch time by country and genre last week?", the query scans the fact table filtered by date, joins to two dimensions, groups, and aggregates.
Why This Matters:
This structure is optimized for analytical workloads, not transactional ones. In operational databases, third normal form reduces redundancy and handles frequent updates efficiently. But analytical queries need to scan millions or billions of rows and aggregate across multiple dimensions. Dimensional models trade some storage redundancy for predictable query patterns and fast performance.
✓ In Practice: At companies like Netflix or Airbnb, dimensional models power thousands of dashboards. A well designed fact table with proper partitioning returns results in 2 to 15 seconds even when scanning hundreds of millions of rows with dozens of concurrent users.
The key design decisions include choosing the grain of each fact table, identifying which dimensions are shared across business domains as conformed dimensions, and deciding how to handle attributes that change over time, like a customer moving from free tier to premium or a city changing regions.💡 Key Takeaways
•Facts contain numeric measures and foreign keys at a defined grain, such as one row per order or one row per ride. Each fact table represents a business process.
•Dimensions provide descriptive context like customer attributes, product details, or date hierarchies. They answer who, what, where, when, and how for each fact.
•Dimensional models optimize for read heavy analytical workloads, not transactional updates. Queries scan millions of rows but follow predictable join patterns.
•Surrogate keys decouple the warehouse from operational system key changes. The fact table stores dimension_key instead of natural identifiers like customer_id.
•Conformed dimensions are shared across multiple fact tables, ensuring consistent definitions. A Time dimension used by Sales, Inventory, and Support facts enables reliable cross domain analysis.
•Typical query performance on modern cloud warehouses is 2 to 15 seconds for queries scanning hundreds of millions of fact rows joined to 3 to 10 dimensions with proper partitioning.
📌 Examples
A ride sharing company models Trips as a fact table at grain "one row per completed trip". Measures include distance_miles, fare_amount, and duration_seconds. Foreign keys point to Driver, Rider, City, and Time dimensions.
An e-commerce warehouse has Orders fact (order_id, order_date_key, customer_key, total_amount) joined to Customer dimension (customer_key, name, segment, region) and Date dimension (date_key, full_date, month, quarter, fiscal_year).
Netflix query: "Average watch time by content genre and user country for last 7 days" scans VideoPlays fact filtered by date_key, joins to Content and User dimensions, groups by genre and country, and computes AVG(watch_time_sec).