Storage and Cost Economics of Denormalization at Scale
Normalize When Writes Dominate
Normalization shines for write-heavy workloads. An e-commerce inventory system processes 10,000 stock updates/second across warehouses. Denormalizing product availability into every storefront would require updating 50 regional caches per change, turning 10K writes into 500K. Keeping a single normalized inventory table with joins on read is simpler and faster. Rule: if write amplification factor exceeds 10x, normalization usually wins.
Denormalize When Reads Dominate
Denormalization shines for read-heavy endpoints with strict latency SLOs. A product listing page with 100,000 QPS and 50 ms p99 requirement cannot afford joining products, categories, reviews, and inventory tables. Each join adds 5-10 ms. Instead, precompute a product_display row with all fields needed for rendering. Product updates (~100/sec) trigger async updates to the denormalized store. Read:write ratio of 1000:1 makes the write overhead negligible.
Normalize for Data Integrity
Financial systems, booking platforms, and anywhere correctness matters more than latency should normalize. A hotel reservation system must ensure a room is booked exactly once. Denormalizing availability to multiple caches risks double-booking during the sync window. The normalized reservations table with a unique constraint on (room_id, date) guarantees correctness. Accept 50-100 ms latency for strong consistency. Same applies to bank transfers, seat assignments, and inventory reservations.
Denormalize for Complex Aggregations
When queries require aggregating across millions of rows, denormalize into precomputed results. A dashboard showing total revenue by product category across 50 million orders would take 30+ seconds with joins. Maintain a category_revenue table updated incrementally on each order. Dashboard loads in < 100 ms. Accept eventual consistency (data may lag seconds to minutes) for dramatic query speedup. Analytics and reporting almost always benefit from denormalization.