Database Design • Indexing StrategiesEasy⏱️ ~3 min
How Do B+ Tree Indexes Reduce Query Cost?
Indexes are on disk data structures that transform full table scans from O(N) operations into O(log_b N + K) lookups, where b is the branching factor and K is the number of qualifying rows. Without an index, finding a specific row requires touching every page in the table. A 500 million row table with 8 KB pages might span 4 TB of storage, requiring millions of page reads. With a B+ tree index, the same lookup touches only 3 to 4 pages.
The magic comes from the tree structure and page layout. Each internal node in a B+ tree stores keys and pointers to child pages. With 8 KB pages and roughly 16 to 32 bytes per entry, you get a branching factor of 256 to 512. This means each level of the tree multiplies your search space by 256x or more. For 100 million rows, the tree height is only 3 to 4 levels deep, so a point lookup requires just 3 to 4 page reads instead of millions.
Leaf pages store keys in sorted order along with either pointers to the actual data rows (nonclustered index) or the data itself (clustered index). This ordering enables efficient range scans: once you find the starting key, subsequent rows are on the same page or linked sequential pages. A query fetching 1000 consecutive rows might touch 10 to 20 leaf pages instead of performing 1000 random lookups scattered across the table.
The tradeoff is write amplification and storage overhead. Every insert or update that changes indexed columns must update both the table and all affected indexes. On Microsoft SQL Server with three nonclustered indexes, a single row insert becomes four write operations. Page splits occur when leaf pages fill up, causing cascading updates up the tree and temporary write stalls. Production systems typically configure 10 to 20 percent fill factor to leave breathing room for inserts, trading storage efficiency for write performance.
💡 Key Takeaways
•Branching factor of 256 to 512 per level with 8 KB pages means tree height stays at 3 to 4 levels for hundreds of millions of rows
•Leaf pages store keys in sorted order enabling range scans to read sequential pages instead of random lookups across the table
•Each additional index multiplies write cost: a table with 3 indexes turns every insert into 4 write operations across different tree structures
•Page splits occur when leaves fill up, causing cascading tree updates and temporary write stalls at 10K to 50K transactions per second workloads
•Fill factor of 10 to 20 percent reserves free space in leaf pages to reduce split frequency at the cost of 10 to 20 percent more storage
📌 Examples
Microsoft SQL Server: On a 500M row table, a covering nonclustered index reduces logical reads from 6.4 million page touches (full scan) to 3000 page touches (index scan plus lookups)
Oracle Database: Reverse key indexes distribute monotonically increasing sequence inserts across leaf pages to avoid hot block contention in Real Application Clusters (RAC) under 20K to 50K inserts per second