Database DesignIndexing StrategiesHard⏱️ ~3 min

How Do Hot Spots and Skew Break Indexing Performance?

Monotonically increasing keys like timestamps, auto increment sequences, or UUIDs with time prefixes create hot spots that concentrate all writes on a single leaf page or partition. In a B+ tree, every new insert with a sequential key appends to the rightmost leaf page, causing repeated page splits and latch contention. At 10K to 50K inserts per second, this single hot page becomes a serialization bottleneck. Threads queue for exclusive latches on the page, inflating p99 write latencies from single digit milliseconds to hundreds of milliseconds. The problem worsens in distributed systems where the hot partition saturates write capacity while other partitions sit idle. Low cardinality and skewed distributions destroy index selectivity. A Status column with 99 percent Active and 1 percent Inactive appears selective in schema but is nearly useless for queries filtering Active. An index scan on Status equals Active touches 99 percent of table rows, no better than a full table scan. Query optimizers compare index cost versus scan cost using statistics; if index selectivity is poor, the optimizer chooses a scan, rendering the index wasted storage and write overhead. Outdated statistics exacerbate this: if the distribution shifts from 50/50 to 99/1 but stats are stale, the optimizer picks a bad index plan touching millions of unnecessary rows. Bitmap indexes in Online Transaction Processing (OLTP) environments create another failure mode. Bitmap entries are coarse grained, often representing thousands of rows per segment. Concurrent updates to the same segment require locking the entire segment, serializing writes. A high concurrency workload at 1K to 10K updates per second on a low cardinality column (like Status or Country) can collapse throughput to single digit transactions per second as threads wait for segment locks. Oracle and SQL Server documentation explicitly warns against bitmap indexes in OLTP, reserving them for read mostly data warehouses. Mitigation strategies trade complexity for distribution. Hash prefixing or reverse key indexes randomize monotonic keys, spreading writes across the tree at the cost of disabling range scans on the indexed column. Time bucketed partitioning with per partition local indexes isolates hot writes to the current partition while keeping historical partitions cold. Filtered or partial indexes that exclude the dominant value (index only Inactive rows, ignoring Active) reduce index size by 99 percent and restore selectivity. The key is measuring actual query patterns and cardinality from production logs, not assumptions from schema design.
💡 Key Takeaways
Monotonic keys (timestamps, sequences) concentrate writes on rightmost B+ tree leaf causing page split storms and latch contention degrading p99 latency from 5ms to 500ms at 10K inserts per second
Skewed distributions with 99 percent in one value make indexes on that column scan 99 percent of rows, equivalent to full table scan cost, wasting storage and write amplification with no query benefit
Bitmap indexes serialize concurrent updates to the same low cardinality value by locking coarse segments, collapsing throughput from 10K to single digit updates per second in OLTP workloads
Hash prefix or reverse key indexes randomize hot keys spreading load but disable all range queries and ORDER BY clauses on the indexed column, acceptable only for pure equality lookups
Filtered indexes excluding dominant skewed values (index Status WHERE Status != 'Active') shrink index size by 99 percent and restore selectivity for minority value queries
📌 Examples
Oracle Real Application Clusters (RAC): Auto increment primary key creates hot block at 50K inserts per second; reverse key index distributes load across nodes reducing Global Cache Service (GCS) waits from 40 percent to under 5 percent
PostgreSQL time series table: Partitioned by month with local indexes on (month, sensor_id, timestamp); current month partition is hot but isolated, old partitions compressed and moved to cheap storage avoiding hot spot propagation
Microsoft SQL Server: Orders table with Status column (95 percent Complete, 5 percent Pending); filtered nonclustered index on (Status, OrderDate) WHERE Status='Pending' is 20x smaller and enables sub 10ms queries on Pending orders
← Back to Indexing Strategies Overview
How Do Hot Spots and Skew Break Indexing Performance? | Indexing Strategies - System Overflow