How Do Hot Spots and Skew Break Indexing Performance?
Hot Spot Problem
Monotonically increasing keys (timestamps, auto-increment IDs, time-prefixed UUIDs) create hot spots. Every new insert goes to the same location: the rightmost leaf page in a B+ tree or a single partition in a distributed database. At 10K-50K inserts/sec, this single location becomes a bottleneck. Threads queue for access, inflating P99 (99th percentile latency) from single-digit ms to hundreds of ms.
Low Selectivity Problem
Low cardinality columns (columns with few distinct values, like status with only active/inactive) make poor leading index columns. With only 2 values, an index scan must read 50% of the index on average. Skewed distributions worsen this: if 99% of rows are status=active, an index on status provides almost no filtering benefit.
Solutions
Key salting: Prepend a random or computed prefix to spread writes across the index. Trade-off: point queries must check multiple prefixes.
Write sharding: Spread a hot key across multiple logical keys (counter_0 through counter_9), aggregate reads from all shards.
Partial indexes: Index only rows matching a condition (WHERE status = pending), reducing index size for low-selectivity columns.
Detection
Monitor write latency distribution. If P99 is 10x+ higher than P50 (median), hot spots likely exist. Check partition/page access patterns to identify the hot location.