Data Lakes & LakehousesDelta Lake Internals & ACID TransactionsMedium⏱️ ~2 min

Optimistic Concurrency: How Multiple Writers Stay Safe

The Concurrency Challenge: In a production lakehouse, dozens of batch jobs and streaming pipelines write to the same Delta table simultaneously. How do you prevent conflicting updates without locking the table and destroying throughput? Delta Lake uses optimistic concurrency control (OCC). Writers assume conflicts are rare, perform their work independently, and validate only at commit time. This contrasts with pessimistic locking (like row locks in traditional databases), which would serialize all writes and kill performance. The Write Flow: Here is how a job commits a change: First, read the current table snapshot at version N and record which files you read (the read set). Second, perform your computation and stage new Parquet files. Third, construct a proposed commit with Add actions for new files and Remove actions for any files you logically delete. Fourth, attempt to write the next log entry (version N+1) to the transaction log. Before finalizing the commit, validate that none of the files in your read set have been removed or modified by concurrent writers. You do this by checking if any Remove actions appeared in log versions between N and the current latest version. If validation passes, your commit succeeds. If validation fails, you have a conflict.
❗ Remember: Only one writer can successfully append version N+1. Object storage operations like atomic rename or conditional puts enforce this. The second writer trying to write version N+1 fails, detects a conflict, and must retry.
Conflict Resolution: When a conflict occurs, the job refreshes its snapshot to the latest version and retries. If Job A and Job B both start at version 100 and try to update different partitions, only one will win the race to write version 101. The loser refreshes to version 101, re-validates its changes against the new state, and writes version 102. In practice, partition level isolation helps reduce conflicts. If jobs update disjoint partitions, their commits typically succeed without conflict. However, for high contention scenarios (many writers updating the same partition), you see increased retry cycles. Real Numbers from Production: At a large streaming deployment ingesting 200,000 events per second across 10 concurrent writers, conflict rates stay under 2% because each writer targets a different time partition. Average end to end latency per microbatch is 6 to 8 seconds including commit. When conflicts occur, retry adds 1 to 3 seconds. For batch workloads with 50 concurrent jobs writing to overlapping partitions, conflict rates can rise to 10 to 15%. Each conflict triggers a full recomputation of the affected partition, which may take minutes. This is why production systems carefully shard workloads and use incremental processing to minimize contention.
💡 Key Takeaways
Optimistic concurrency control (OCC) assumes conflicts are rare: writers perform work independently and validate only at commit time, avoiding locks
Commit validation checks if any files in the read set were removed by concurrent writers between the start version and commit time, failing if conflicts exist
Partition level isolation reduces conflicts: jobs updating disjoint partitions (different date ranges or customer IDs) commit without interference
High contention workloads (10 to 15% conflict rates) suffer retry cycles that add 1 to 3 seconds per attempt, making workload sharding critical for performance
📌 Examples
1Streaming workload with 10 writers, each targeting a different hourly partition: conflict rate under 2%, average commit latency 6 to 8 seconds including validation and log write.
2Batch workload with 50 jobs updating overlapping partitions: conflict rate 10 to 15%, with each conflict triggering a full partition recomputation that takes minutes.
← Back to Delta Lake Internals & ACID Transactions Overview