How Do Clustered and Nonclustered Indexes Differ?

Clustered Index
A clustered index determines how rows are physically stored on disk. The index and the data are the same structure: the leaf level of the tree IS the data, with rows sorted by the index key. Because rows can only be physically sorted one way, there can be only one clustered index per table. Range queries on the clustered key are fast because matching rows are stored contiguously, minimizing disk seeks.
Nonclustered Index
A nonclustered index is a separate structure pointing to the actual data. The leaf level stores sorted index keys plus a "row locator" (pointer to where the actual row lives). The rows themselves may be scattered across the disk. For highly selective queries (returning few rows), nonclustered indexes work well. For less selective queries, fetching scattered rows requires many random disk reads.
Bookmark Lookups
When a query needs columns not in the nonclustered index, the database must "look up" the full row using the pointer. This bookmark lookup adds one random read per row. Returning 1,000 rows means 1,000 additional random reads. At some threshold (often 1-5% of table), the optimizer chooses a full table scan instead.
Covering Indexes
A covering index includes all columns a query needs, eliminating bookmark lookups entirely. The query reads only from the index, never touching the table. Trade-off: larger indexes use more disk space and slow down inserts/updates.

💡 Key Takeaways

✓Clustered index: leaf level IS the data, rows physically sorted by key; only one per table

✓Nonclustered index: separate structure with pointers to scattered rows; multiple allowed per table

✓Bookmark lookup: fetching full row via pointer, adds 1 random read per row; at 1-5% selectivity, full scan wins

✓Covering index includes all needed columns, avoiding bookmark lookups at cost of larger index

📌 Interview Tips

1Clustered key choice: date for time-series (recent data contiguous), user_id for user data (user rows together)

2Bookmark lookup math: 1000 rows × 1 random read = 1000 disk seeks; sequential scan of 10,000 pages may be faster

3Covering index design: index on (status, created_at) INCLUDE (name, email) covers SELECT name, email WHERE status = active

← Back to Indexing Strategies Overview