Concurrency Fundamentals • AtomicityMedium⏱️ ~2 min
Hardware Support for Atomicity
Key Insight
Modern CPUs provide special instructions that guarantee atomicity. These are the building blocks for all synchronization primitives.
The LOCK Prefix (x86)
On x86, the LOCK prefix makes the following instruction atomic by locking the memory bus. Other CPUs cannot access memory until the instruction completes. LOCK XADD atomically adds to memory. LOCK CMPXCHG is compare-and-swap.
Common Atomic Instructions
LL/SC On ARM/RISC
ARM uses Load-Link/Store-Conditional. Load-Link reads a value and marks the address. Store-Conditional writes only if no other core has written. If it fails, you retry. This avoids locking the entire bus.
Cache Coherency
Modern CPUs have multiple caches. MESI protocol ensures coherency: when one core writes, others invalidate their cached copies. This is automatic but has performance cost.
Cost of Atomics
Atomic operations are 10-100x slower than regular operations. They must coordinate across cores and possibly lock the memory bus. Use them only when needed.
Takeaway: Hardware provides the primitives. Your language/library wraps them into usable atomic types.
💡 Key Takeaways
✓Compare And Swap (CAS) is the fundamental hardware atomic operation. It checks and updates a value in one indivisible step.
✓CAS returns success/failure. On failure, you know another thread modified the value and can retry with the new value.
✓Fetch and add atomically increments and returns the old value. More efficient than CAS loop for simple counters.
✓Cache coherence protocols (MESI) ensure all cores see consistent memory. Atomic operations trigger cache line invalidation.
✓Atomic operations are slower than regular memory access (10 to 100 cycles vs 1 to 3 cycles) due to cache coordination.
📌 Examples
1CAS based increment: do { old = value; } while (!CAS(&value, old, old+1)). Retries if another thread modified value.
2x86 provides LOCK CMPXCHG instruction for CAS, LOCK XADD for fetch and add.