Database Design • Transaction Isolation LevelsMedium⏱️ ~3 min
MVCC and Snapshot Isolation: High Concurrency with Trade-offs
Multi-Version Concurrency Control (MVCC) is the dominant concurrency strategy in modern databases because it decouples readers from writers. Instead of locking rows, writers create new versions tagged with commit timestamps while readers select a snapshot timestamp and see the latest version committed before that time. Oracle pioneered this model; PostgreSQL, SQL Server (with RCSI/SI enabled), and most distributed databases now use variants.
The performance benefit is dramatic for read heavy workloads. Readers never block writers and writers never block readers, eliminating a major source of contention. In Microsoft SQL Server, enabling Read Committed Snapshot Isolation on systems handling 10,000 to 100,000 transactions per second typically eliminates read/write blocking and reduces deadlock rates to near zero. Long running analytical queries can read consistent snapshots without interfering with Online Transaction Processing (OLTP) writes. This is why Oracle databases handle mixed workloads well: reporting queries run for 10 to 30 minutes against stable snapshots while transactional writes proceed unimpeded.
The trade-off is write skew and version storage overhead. Snapshot isolation prevents dirty reads, non-repeatable reads, and (in most implementations) phantoms, but it does not prevent write skew: two concurrent transactions each read a consistent snapshot, both pass business rule checks, but their combined writes violate an invariant. Classic example: hospital on call system where each doctor checks "at least one other doctor is on call" (both see the other), then both go off call, leaving zero on call. This requires application level checks, database constraints, or upgrading to Serializable isolation.
Version storage is the operational pain point. PostgreSQL stores old row versions in the main table, causing bloat; long lived transactions prevent vacuum from reclaiming space, and multi-hour transactions can inflate storage by gigabytes and degrade write throughput. SQL Server uses a separate version store in tempdb; under heavy write load the version store can grow rapidly, causing memory and input/output pressure, and cleanup lag leads to latency spikes. Oracle uses undo segments with configurable retention; insufficient retention causes "snapshot too old" errors, forcing long queries to fail and requiring dedicated read replicas or increased undo retention tuning.
💡 Key Takeaways
•MVCC creates new row versions on every update with commit timestamps, readers pick a snapshot time and read the latest version at or before that timestamp
•Readers never block writers and writers never block readers, eliminating the primary source of lock contention in read heavy systems
•SQL Server with RCSI enabled at 10,000 to 100,000 transactions per second eliminates blocking but shifts pressure to version store in tempdb, requiring monitoring of cleanup lag
•Write skew is the critical vulnerability: concurrent transactions on consistent snapshots can each pass constraints but combined effect violates invariants (classic hospital on call problem)
•Long running transactions prevent version cleanup causing storage bloat; PostgreSQL multi-hour transactions can increase table size by gigabytes until vacuum runs
•Oracle undo retention must be tuned for longest query duration; banks running 10 to 30 minute reports often use dedicated replicas to avoid snapshot too old failures on production OLTP
📌 Examples
PostgreSQL SSI: Two transactions concurrently increment a counter, both read value 100, both write 101, lost update occurs unless SSI detects read-write dependency and aborts one
SQL Server version store: Heavy update workload at 50,000 transactions per second grows version store by 10 to 20 gigabytes per hour, cleanup lag causes 200 to 500 millisecond latency spikes
Oracle long query: Analytical report runs for 25 minutes, undo retention set to 15 minutes, query fails with snapshot too old, must increase retention or move to read replica
Google Spanner: Uses MVCC with TrueTime for strict serializability; readers see snapshots but commit wait (typically under 7 milliseconds) ensures global ordering and prevents write skew