Learn→Distributed Systems Primitives→Unique ID Generation (Snowflake, UUID)→2 of 5

Distributed Systems Primitives • Unique ID Generation (Snowflake, UUID)Medium⏱️ ~3 min

Time Ordering vs Randomization: Index Locality and Hotspot Trade-offs

The ordering properties of identifiers fundamentally shape database performance characteristics, particularly for write heavy workloads. Randomly generated UUIDv4 identifiers maximize distribution across partitions but create poor B-tree locality. Each insert lands at a pseudo random position in the index, causing page splits throughout the tree structure and reducing cache effectiveness. This random write pattern prevents the database from leveraging sequential IO and forces the buffer pool to maintain hot pages scattered across the entire key space. In contrast, time ordered identifiers like Snowflake or UUIDv7 enable append friendly behavior where new inserts cluster at the rightmost edge of the B-tree, reducing page splits and improving write throughput.

However, time ordering introduces its own performance challenges in distributed, partitioned storage systems. Google Firestore documentation explicitly warns against monotonically increasing keys because they create partition hotspots that degrade write throughput and tail latency. When all writes target the partition holding the highest key range, that single partition becomes a bottleneck while other partitions sit idle. Reddit legacy system illustrated this limitation with base36 encoded incrementing integer IDs (with type prefixes like t3_), which concentrated all inserts on a single database shard. Systems handling millions of writes per second cannot afford to funnel all traffic through one partition.

The solution depends on your partitioning strategy and access patterns. For range partitioned stores like Bigtable or HBase, use randomized keys or hash prefixed keys to distribute writes evenly. For single node databases with B-tree indexes, time ordered keys significantly improve write performance. Microsoft SQL Server offers sequential Globally Unique Identifiers (GUIDs) specifically to reduce B-tree fragmentation compared to random GUIDs, improving write throughput under sustained insert load. UUIDv7 and similar formats (ULID, KSUID) attempt to capture both benefits by providing coarse grained time ordering (for approximate sorting and range scans) while randomizing lower order bits to distribute load within each time window.

💡 Key Takeaways

✓Random UUIDv4 keys cause B-tree page splits throughout the index structure, preventing sequential IO optimization and scattering hot pages across the entire key space.

✓Time ordered Snowflake and UUIDv7 IDs enable append only B-tree behavior at the rightmost edge, reducing page splits and improving write throughput on single node databases.

✓Monotonically increasing keys create partition hotspots in distributed systems, concentrating all writes on a single partition while others remain idle, as Firestore documentation warns.

✓Microsoft SQL Server sequential GUID variant reduces index fragmentation versus random GUIDs, measurably improving insert performance for write intensive workloads.

✓Hybrid approaches like UUIDv7 randomize lower bits while preserving coarse time ordering, distributing writes within millisecond windows to balance locality and load distribution.

📌 Interview Tips

1Google Firestore explicitly recommends high entropy randomized document IDs to avoid write hotspotting, enabling linear scaling across partitions by uniformly distributing writes.

2Reddit historically used base36 incrementing IDs (t3_ prefix format) which concentrated inserts on single shards, illustrating why large systems migrate to distributed generation schemes.

3Instagram sharded ID scheme at Meta encodes shard identity to enable horizontal write scaling, eliminating the single autoincrement bottleneck that plagued centralized ID generation.

← Back to Unique ID Generation (Snowflake, UUID) Overview