Database DesignTransaction Isolation LevelsMedium⏱️ ~3 min

Choosing Isolation Levels: Correctness Requirements vs Performance Trade-offs

When Serializable is Required

Use Serializable isolation for operations where correctness bugs have severe consequences. Monetary transfers, inventory reservations, seat bookings, unique resource allocation, and regulatory compliance operations are non-negotiable. The cost of a double-spend, double-booking, or audit failure far exceeds any performance penalty. Design these critical paths narrow: single-row updates, minimal read sets, short transaction scope. On abort, retry with exponential backoff (wait 100 ms, then 200 ms, then 400 ms, etc.) to avoid retry storms.

When Weaker Isolation Suffices

Read-heavy analytics, dashboards, ranking feeds, and approximate aggregates can tolerate slightly stale or non-atomic reads. Read Committed or Snapshot Isolation delivers 2-5x higher throughput under contention. For read-only queries, stale reads (accepting data up to N seconds old) can serve from local replicas with sub-millisecond latency, bypassing coordination entirely. Pattern: run critical writes at Serializable, read paths at Read Committed or on read replicas, analytics on dedicated clusters.

Contention-Based Selection

Consider your contention profile. If write conflicts are rare (distinct users, sharded keys), SSI provides safety with minimal aborts (< 1%). If contention is high (hot counters, leaderboard updates), 2PL serializes access predictably but can cause queue buildup and timeouts. Alternative: redesign to avoid conflicts entirely. Use atomic increment (SET x = x + 1) instead of read-modify-write. Use append-only event logs where each write appends a new row rather than updating existing data, then materialize current state asynchronously.

Per-Transaction Isolation

Most databases allow setting isolation level per transaction. Use this for heterogeneous workloads. Example: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE before critical writes, READ COMMITTED for general queries. Document which transactions use which level. Test under realistic concurrency before production.

Decision Framework

Start with defaults. Most databases default to Read Committed or Repeatable Read, which handles typical transactional workloads. Measure before tuning. Profile lock waits, abort rates, and version store pressure under production load. Upgrade isolation for bugs. When anomalies occur (lost updates, double allocations), increase isolation for that specific transaction class. Downgrade for performance. If contention causes latency spikes, narrow transaction scope or relax isolation on non-critical paths.

💡 Key Takeaways
Use Serializable for critical invariants (money transfers, unique allocation, compliance) where correctness bugs are unacceptable despite 2 to 5 times lower throughput
Read Committed or snapshot isolation on read replicas for analytics, dashboards, and approximate aggregates tolerates stale reads and delivers 2 to 5 times higher throughput
Google Spanner: strongly consistent reads pay TrueTime commit wait (under 7 milliseconds), stale reads bypass coordination and serve from local replicas with sub-millisecond latency
Target abort/retry rates under 1 to 2 percent for SSI; rates over 5 to 10 percent indicate hotspot contention requiring sharding or commutativity redesign
Redesign for commutativity to avoid conflicts: use database increment operations instead of read-modify-write, or append only event logs with async materialization
Monitor lock wait times (2PL), version store/undo growth (MVCC), and tail latencies; tune isolation per transaction based on measured production behavior and failure modes
📌 Interview Tips
1Banking transfer: Use Serializable SSI, accept 3 to 5 millisecond median latency and occasional aborts, implement idempotent retry logic to handle 1 to 2 percent serialization failures
2E-commerce product feed: Use Read Committed on read replica, tolerate 1 to 2 second replication lag for 10 times higher read throughput, strong consistency only for checkout/payment
3Social media leaderboard: Hot counter at 10,000 updates per second causes 15 percent SSI aborts, redesign as append only event stream with periodic materialization to eliminate write conflicts
4Oracle reporting: 20 minute fraud detection query runs on dedicated replica with 60 minute undo retention at Read Committed, production OLTP uses 15 minute retention at Serializable for writes
← Back to Transaction Isolation Levels Overview