Distributed Systems Primitives • Unique ID Generation (Snowflake, UUID)Medium⏱️ ~3 min
UUID vs Snowflake: Core Architecture and Collision Mathematics
Distributed systems need unique identifiers without coordination bottlenecks, leading to two dominant patterns: Universally Unique Identifiers (UUIDs) and Snowflake style IDs. UUIDs are 128 bit values designed for collision resistance through randomness. UUIDv4 offers 122 bits of randomness, yielding a collision probability of approximately 5e-14 after generating one trillion identifiers using the birthday problem formula n(n-1)/(2·2^122). This probability is effectively zero in practice, meaning you can generate trillions of IDs across any number of nodes without coordination and expect zero collisions.
Snowflake style IDs take a fundamentally different approach, embedding metadata into fixed width integers. The canonical 64 bit layout dedicates 41 bits for milliseconds since a custom epoch (covering roughly 69 years), 10 bits for worker and datacenter identity (supporting 1024 unique workers), and 12 bits for a per millisecond sequence counter (allowing 4096 IDs per millisecond per worker). This design achieves approximately 4.096 million IDs per second per worker, scaling linearly with worker count. A fleet of 1024 workers can theoretically generate 4.19 billion IDs per millisecond. The key trade here is coordination complexity versus efficiency: UUIDs eliminate all coordination at the cost of 128 bits and random ordering, while Snowflake achieves 64 bit compactness and time ordering but requires clock synchronization and worker identity management.
The size difference has profound implications for storage and indexing. A 128 bit UUID primary key doubles the index footprint compared to 64 bit Snowflake IDs, directly impacting cache residency and disk IO. For a table with 10 billion rows, switching from 128 bit to 64 bit keys saves 80 GB in just the primary key index, potentially keeping critical index nodes in memory that would otherwise spill to disk. UUIDv7 attempts a middle ground by allocating 48 bits to millisecond timestamps and 74 bits to randomness, providing approximate time ordering with collision probability similar to UUIDv4 within reasonable generation rates.
💡 Key Takeaways
•UUIDv4 collision probability is 5e-14 after one trillion IDs, enabling truly independent generation across unlimited nodes without any coordination overhead.
•Snowflake 64 bit layout generates 4096 IDs per millisecond per worker (approximately 4.096 million per second), scaling linearly to billions per second across a worker fleet.
•Storage efficiency differs by 2x: 128 bit UUIDs double index size versus 64 bit Snowflake IDs, saving 80 GB of index space per 10 billion rows when using smaller keys.
•UUIDv7 bridges both approaches with 48 bit millisecond timestamps plus 74 bit randomness, achieving approximate sortability while maintaining collision resistance.
•Trade off is coordination versus independence: Snowflake requires synchronized clocks and unique worker ID assignment, while UUID requires no coordination but sacrifices ordering and compactness.
📌 Examples
Twitter Snowflake generates IDs with 41 bit millisecond timestamp, 10 bit worker/datacenter ID, and 12 bit sequence, exposing IDs as strings in JSON (id_str field) to avoid JavaScript 53 bit integer precision loss.
Google Firestore uses 20 character randomized auto IDs with approximately 119 bits of entropy, generated client side to ensure uniform distribution across partitions and avoid write hotspots.
Instagram at Meta adopted 64 bit Snowflake style sharded IDs encoding timestamp, shard identity, and per shard sequence to eliminate central autoincrement bottlenecks while preserving time ordering.