Database Selection Framework: Core Decision Dimensions

Definition
Database selection starts with mapping three fundamental dimensions: data model fit, access patterns, and consistency requirements. Your data structure, query shapes, and correctness needs together determine which database category fits best.
Data Model:

Your data structure drives the first choice: relational models excel at normalized schemas with complex joins, key value stores optimize single record lookups, document databases handle nested hierarchical data, wide column stores serve time series and sparse data patterns, and graph databases enable relationship traversal.

Access Patterns:

Access patterns determine whether you need read heavy optimization (caching, replicas, denormalization), write heavy architecture (LSM trees, partitioning, append only logs), or balanced throughput. The ratio of reads to writes, query complexity, and latency requirements all factor in. A 95% read workload with complex aggregations points toward different solutions than a 50/50 split with simple lookups.

Consistency Needs:

Consistency requirements create hard constraints. Financial transactions, inventory management, and booking systems typically need ACID guarantees. Social feeds, content catalogs, and analytics can tolerate eventual consistency. Multi region deployments force explicit choices between strong consistency (higher latency) and eventual consistency (complexity in handling conflicts). These three dimensions intersect to narrow your choices before you even consider specific products.

💡 Key Takeaways

✓Data model alignment is foundational: relational for normalized ACID transactions, wide column for sparse high write workloads, graph for relationship traversals, time series for temporal compression

✓Access patterns drive engine choice: read heavy workloads (80%+ reads) benefit from replica scale and caching, write heavy workloads (50K+ writes per second) need LSM tree storage to avoid write amplification

✓Strong consistency costs latency: Google Spanner adds 5 to 10 milliseconds intra region and 50 to 100+ milliseconds cross region for linearizability, while eventual systems like Dynamo serve in single digit milliseconds

✓PACELC captures operational tradeoffs: during partitions sacrifice availability for consistency (financial systems) or consistency for availability (shopping carts); under normal operation trade latency for consistency guarantees

✓Mismatch consequences are severe: using a strongly consistent global database for high throughput logging can inflate costs 10x and add unnecessary latency, while using eventual consistency for inventory causes overselling

📌 Interview Tips

1Amazon shopping cart uses Dynamo with eventual consistency: accepts 50 to 100 millisecond write latency during peak traffic, tolerates temporary item count divergence, resolves conflicts with vector clocks to remain available during network partitions

2Google Ad platform uses Spanner for financial transactions: requires strong consistency across regions for billing accuracy, accepts 50 to 100 millisecond cross region commit latency, prevents double charging with externally consistent transactions

← Back to Database Selection Framework Overview