MVCC and Snapshot Isolation: High Concurrency with Trade-offs

How MVCC Works
Multi-Version Concurrency Control (MVCC) is the dominant strategy in modern databases. Instead of locking rows, writers create new versions tagged with commit timestamps. Readers select a snapshot timestamp and see the latest version committed before that time. This decouples readers from writers: readers never block writers and writers never block readers, eliminating a major source of contention. Long-running analytical queries can read consistent snapshots without interfering with transactional writes happening simultaneously.
Performance Benefits
For read-heavy workloads, MVCC is dramatically faster than lock-based isolation. Enabling snapshot isolation on systems handling 10,000-100,000 transactions/sec typically eliminates read/write blocking and reduces deadlock rates to near zero. Reporting queries running 10-30 minutes execute against stable snapshots while transactional writes proceed unimpeded. The trade-off: version storage overhead and write skew risk.
Snapshot Isolation vs Serializable
Snapshot Isolation (SI) provides strong consistency for most workloads: each transaction sees a consistent snapshot at start time. It prevents dirty reads, non-repeatable reads, and phantoms. However, SI does not prevent write skew because each transaction reads a valid snapshot and makes locally-valid writes that together may violate invariants. Serializable Snapshot Isolation (SSI) extends SI by tracking which rows each transaction reads and writes. If the database detects that two concurrent transactions have conflicting dependencies (one wrote data the other read), it aborts one to maintain serializability.
Version Storage Overhead
Old versions must be retained until no active transaction needs them. A background process called version cleanup (or vacuum) periodically removes obsolete versions. Different databases handle storage differently: some store old versions in the main table causing bloat, others use separate version stores. Long-running transactions prevent cleanup from reclaiming versions, inflating storage by gigabytes and degrading write throughput. Monitor active transaction duration and set timeouts.
Read Committed vs Snapshot Isolation
Read Committed uses a new snapshot for each statement. Snapshot Isolation uses a single snapshot for the entire transaction. Read Committed is cheaper (no version retention across statements) but allows non-repeatable reads. Choose Read Committed for simple transactional workloads where each statement is independent. Choose Snapshot Isolation when transaction logic depends on consistent reads across multiple statements, such as calculating a total then updating based on it.

💡 Key Takeaways

✓MVCC creates new row versions on every update with commit timestamps, readers pick a snapshot time and read the latest version at or before that timestamp

✓Readers never block writers and writers never block readers, eliminating the primary source of lock contention in read heavy systems

✓SQL Server with RCSI enabled at 10,000 to 100,000 transactions per second eliminates blocking but shifts pressure to version store in tempdb, requiring monitoring of cleanup lag

✓Write skew is the critical vulnerability: concurrent transactions on consistent snapshots can each pass constraints but combined effect violates invariants (classic hospital on call problem)

✓Long running transactions prevent version cleanup causing storage bloat; PostgreSQL multi-hour transactions can increase table size by gigabytes until vacuum runs

✓Oracle undo retention must be tuned for longest query duration; banks running 10 to 30 minute reports often use dedicated replicas to avoid snapshot too old failures on production OLTP

📌 Interview Tips

1PostgreSQL SSI: Two transactions concurrently increment a counter, both read value 100, both write 101, lost update occurs unless SSI detects read-write dependency and aborts one

2SQL Server version store: Heavy update workload at 50,000 transactions per second grows version store by 10 to 20 gigabytes per hour, cleanup lag causes 200 to 500 millisecond latency spikes

3Oracle long query: Analytical report runs for 25 minutes, undo retention set to 15 minutes, query fails with snapshot too old, must increase retention or move to read replica

4Google Spanner: Uses MVCC with TrueTime for strict serializability; readers see snapshots but commit wait (typically under 7 milliseconds) ensures global ordering and prevents write skew

← Back to Transaction Isolation Levels Overview