Database Design • Distributed SQL (CockroachDB, Spanner)Hard⏱️ ~2 min
Global Time and Timestamp Ordering in Distributed SQL
The hardest problem in distributed SQL is ensuring that transactions appear in a globally consistent order when thousands of nodes are processing operations simultaneously across different datacenters. Without synchronized time, two transactions on opposite sides of the world might both claim to have committed first, breaking serializability.
Google Spanner solves this with TrueTime, a physical time API that provides tight bounds on clock uncertainty. TrueTime returns an interval rather than a single timestamp, typically with uncertainty in the single digit milliseconds. Every Spanner datacenter has atomic clocks and GPS receivers to keep this uncertainty low. When a transaction commits, Spanner performs a commit wait: it delays returning success to the client until it is certain that the commit timestamp is in the past according to every possible observer. This wait is roughly equal to the TrueTime uncertainty, often 1 to 7 milliseconds.
CockroachDB takes a different approach using Hybrid Logical Clocks (HLC). These clocks combine physical timestamps from server clocks with logical counters. When a node receives a message with a timestamp ahead of its local clock, it advances its HLC to that timestamp plus a logical increment. This creates a partial order that respects causality without requiring the tight clock synchronization that Spanner needs. However, CockroachDB still requires clock drift to stay within configured bounds, typically 500 milliseconds. If a node's clock drifts too far, it shuts itself down to protect consistency.
Both systems use these timestamps with MVCC. Every write gets a timestamp, and the storage layer keeps multiple versions of each key. A read transaction chooses a snapshot timestamp and sees all writes committed before that time. This means readers never block writers and vice versa. The transaction layer combines timestamp ordering with Two Phase Commit for multi range writes: it picks a provisional timestamp, writes to all involved ranges, then commits at a final timestamp that ensures serializability.
The practical impact is profound. With consistent timestamps, a banking application can read an account balance in Tokyo immediately after a deposit in London, with strong guarantees that it will never see stale data or observe time moving backwards. The cost is the commit wait in Spanner or the clock synchronization requirements in CockroachDB, both of which add latency and operational complexity that single node databases avoid entirely.
💡 Key Takeaways
•Spanner TrueTime provides timestamp intervals with 1 to 7ms uncertainty using atomic clocks and GPS, enabling external consistency through commit wait
•CockroachDB HLC combines physical time with logical counters to create causal ordering without requiring atomic clocks, but enforces 500ms maximum clock drift
•MVCC stores multiple timestamped versions per key, allowing reads at any snapshot timestamp without blocking concurrent writes
•Two Phase Commit uses provisional timestamps during write phase, then commits at a final timestamp that ensures serializable order across all ranges
•Clock synchronization is an operational requirement: loss of GPS in Spanner increases uncertainty and slows commits, clock drift in CockroachDB triggers node shutdowns
📌 Examples
Spanner commit wait: Transaction picks timestamp 100ms, TrueTime uncertainty is 5ms, commit waits until local time reaches 105ms before returning success
CockroachDB HLC: Node A at time 1000 receives message from Node B at time 1050, advances its HLC to 1050.1 (physical 1050, logical 1)
Banking scenario: Deposit in London at timestamp T1, read in Tokyo at timestamp T2 > T1 always sees the deposit due to timestamp ordering