Database Design • Indexing StrategiesMedium⏱️ ~3 min
How Do Clustered and Nonclustered Indexes Differ?
A clustered index determines the physical storage order of table rows. The leaf pages of a clustered index are the actual data pages, storing entire rows in index key order. This means there can be only one clustered index per table. Every query that uses the clustered key or scans a range of it touches the minimum number of pages because rows are already sorted. Index only queries that read just the key columns require zero additional lookups.
Nonclustered indexes store index key columns plus a pointer or bookmark to the actual data row. The leaf pages contain index keys in sorted order, but the pointers reference rows that may be scattered across many pages in the heap or clustered index. For selective queries returning a few rows, nonclustered indexes are efficient: follow the index to find the handful of target keys, then perform key lookups to fetch the full rows. But if the query returns thousands of rows, you perform thousands of random lookups. On SSDs at roughly 0.1 milliseconds per random read, 100K key lookups accumulate to 10 seconds of device time. On Hard Disk Drives (HDDs) at 5 to 10 milliseconds each, the same operation becomes minutes.
Clustered indexes optimize range queries and reduce read amplification for key based access, but they impose write penalties. Every insert, update, or delete that changes the key order potentially requires moving rows and rebalancing pages, causing page splits and fragmentation. Write amplification for clustered indexes is higher than for nonclustered because the entire row must be repositioned, not just an index entry.
In practice, Microsoft SQL Server and Oracle recommend placing the primary key or most frequently range scanned key as the clustered index. Additional selective predicates get nonclustered indexes. Covering indexes (nonclustered with included columns) eliminate key lookups by storing frequently read columns directly in the index leaf pages, turning expensive two step lookups into single index scans at the cost of wider index pages and higher write overhead.
💡 Key Takeaways
•Clustered index leaf pages are the data itself stored in key order; only one per table possible because rows cannot be physically sorted in multiple ways simultaneously
•Nonclustered indexes store keys plus pointers requiring key lookups to fetch full rows; 1000 result rows mean 1000 random lookups at 0.1 milliseconds each totaling 100 milliseconds on SSDs
•Range queries on clustered indexes scan sequential leaf pages (10 to 20 pages for 1000 rows) versus 1000 scattered random reads for nonclustered without covering
•Covering indexes add frequently read columns as included columns in nonclustered indexes, eliminating key lookups at the cost of doubling or tripling index size and write overhead
•Clustered index inserts on monotonically increasing keys (timestamps, identity columns) create hot page contention at 10K to 50K transactions per second, mitigated by key randomization or partitioning
📌 Examples
Microsoft SQL Server: A nonclustered index on Email with included columns (FirstName, LastName) turns a 1000 row query from 1000 key lookups (100ms) into a single 20 page index scan (5ms)
Oracle Index Organized Tables (IOT): Store the entire table as a clustered B+ tree keyed by primary key, eliminating the separate heap and making primary key access and range scans 2x to 3x faster than heap plus nonclustered
SQL Server with 500M row Orders table: Clustered on OrderID, nonclustered on CustomerID; query filtering CustomerID returning 10K orders performs 10K key lookups taking 1 second on SSD versus 200ms if CustomerID is covered