Database DesignIndexing StrategiesEasy⏱️ ~3 min

How Do B+ Tree Indexes Reduce Query Cost?

Definition
Database indexes are auxiliary data structures that enable finding rows without scanning the entire table. The dominant structure is the B+ tree, which keeps data sorted and balanced to guarantee logarithmic lookup time.

The Problem Indexes Solve

Without an index, finding a single row in 1 million rows requires a full table scan. At 100 rows/page and 0.1ms per disk read, that is 10,000 page reads taking 1 second. With a B+ tree index: 3-4 page reads taking 0.4ms. That is 2500x faster.

B+ Tree Structure

Root: [50 | 100]
↙ ↓ ↘
[10|25|40]
[60|75|90]
[110|130]
↓ ↓ ↓
Leaf nodes (linked list with actual row data or pointers)

The tree has two node types. Internal nodes contain only keys and pointers to children. They act as routing guides. Leaf nodes contain the actual data (or pointers to rows) and are linked together in a doubly-linked list. This linking is crucial: once you find the starting point for a range query like WHERE id BETWEEN 60 AND 90, you simply follow the links without climbing back up the tree.

How Lookup Works

To find key 75: start at root, compare 50 < 75 < 100, follow middle pointer. At internal node [60|75|90], find exact match, follow pointer to leaf containing that row. Total: 3 page reads regardless of table size. Each level multiplies capacity: with 500 keys per node, a 3-level tree indexes 125 million rows (500³).

The Cost of Indexes

Indexes are not free. Every insert, update, or delete must also update all indexes on that table. 5 indexes means 6 writes per row change. Indexes also consume storage, typically 10-30% of table size per index. Rule of thumb: index columns in WHERE, JOIN, and ORDER BY clauses that filter large tables. Skip indexes on small tables or rarely-queried columns.

💡 Key Takeaways
Full table scan: 1M rows ÷ 100 rows/page = 10,000 page reads; B+ tree index: 3-4 page reads (2500x faster)
B+ tree: balanced tree with all data in leaf nodes linked for range scans; 500 keys/node indexes 125M rows in 3 levels
Index cost: each additional index means one more write per insert/update/delete, plus 10-30% storage overhead
Hot internal nodes stay cached in memory, so most queries hit disk only for the final leaf page
📌 Interview Tips
1Quantify the improvement: without index = 1 second query; with index = 0.4ms. This makes indexes critical for any frequently-run query.
2Explain why 3-4 levels: 500³ = 125M rows. Even billion-row tables need only 4 levels because each level multiplies capacity by 500.
3When NOT to index: small tables (scan is fast anyway), write-heavy tables with rare reads, low-selectivity columns.
← Back to Indexing Strategies Overview