Database DesignDatabase Selection FrameworkMedium⏱️ ~3 min

Database Selection Framework: Core Decision Dimensions

Selecting a database starts with mapping three fundamental dimensions: data model fit, access patterns, and consistency requirements. Your data structure drives the first choice: relational models excel at normalized schemas with complex joins, key value stores optimize single record lookups, document stores handle semi-structured data, wide column databases serve sparse attribute sets, graph databases accelerate multi hop traversals, and time series engines compress temporal data. Access patterns define performance characteristics. Systems that are read heavy benefit from replicas and aggressive caching, while systems that are write heavy need append optimized storage like Log Structured Merge (LSM) trees. Point lookups require different indexing than range scans or aggregations. An application doing 80% reads at 200K queries per second (QPS) with 20% writes at 50K QPS has fundamentally different needs than one doing analytical scans over time ranges. Consistency sits on a spectrum from strong (linearizable reads, external consistency across regions) to eventual (temporary staleness, conflict resolution). Strong consistency simplifies application logic but adds latency: Google Spanner adds 5 to 10 milliseconds within a region for commit wait, and 50 to 100+ milliseconds cross region. Eventual consistency like Amazon Dynamo serves requests in single digit milliseconds but requires handling conflicts with vector clocks or last writer wins strategies. The PACELC theorem captures the tradeoff: during network Partitions choose Availability or Consistency; Else under normal operation trade Latency versus Consistency. A shopping cart tolerates temporary divergence for low latency availability. A financial ledger demands strong consistency despite higher latency. This framework prevents mismatches like choosing a strongly consistent global database for a high throughput logging workload that only needs read your writes consistency.
💡 Key Takeaways
Data model alignment is foundational: relational for normalized ACID transactions, wide column for sparse high write workloads, graph for relationship traversals, time series for temporal compression
Access patterns drive engine choice: read heavy workloads (80%+ reads) benefit from replica scale and caching, write heavy workloads (50K+ writes per second) need LSM tree storage to avoid write amplification
Strong consistency costs latency: Google Spanner adds 5 to 10 milliseconds intra region and 50 to 100+ milliseconds cross region for linearizability, while eventual systems like Dynamo serve in single digit milliseconds
PACELC captures operational tradeoffs: during partitions sacrifice availability for consistency (financial systems) or consistency for availability (shopping carts); under normal operation trade latency for consistency guarantees
Mismatch consequences are severe: using a strongly consistent global database for high throughput logging can inflate costs 10x and add unnecessary latency, while using eventual consistency for inventory causes overselling
📌 Examples
Amazon shopping cart uses Dynamo with eventual consistency: accepts 50 to 100 millisecond write latency during peak traffic, tolerates temporary item count divergence, resolves conflicts with vector clocks to remain available during network partitions
Google Ad platform uses Spanner for financial transactions: requires strong consistency across regions for billing accuracy, accepts 50 to 100 millisecond cross region commit latency, prevents double charging with externally consistent transactions
← Back to Database Selection Framework Overview
Database Selection Framework: Core Decision Dimensions | Database Selection Framework - System Overflow