Database DesignIndexing StrategiesMedium⏱️ ~2 min

Why Do Composite Index Column Ordering and Selectivity Matter?

Leftmost Prefix Matching

Composite indexes combine multiple columns into one index. Column order matters: the index is sorted by the first column, then by the second within each first-column value, and so on. An index on (A, B, C) can efficiently serve queries filtering on A alone, (A, B), or (A, B, C). But it cannot efficiently serve queries filtering on B alone or (B, C) because B is not sorted globally, only within each A value. This "leftmost prefix" rule means you must include leading columns in your WHERE clause to use the index.

Selectivity Ordering

Selectivity (what fraction of rows match a condition) determines optimal column order. Place the most selective column first to eliminate the most rows earliest. If column A filters to 1% of rows and column B filters to 50%, leading with A means the index scan touches 1% of leaf pages. Leading with B would scan 50% first.

Equality vs Range

Equality columns (=) should come before range columns (<, >, BETWEEN). Once a range condition is evaluated, subsequent columns cannot use the index efficiently. For a query WHERE status = active AND created_at > last_week, index on (status, created_at) works well. Index on (created_at, status) cannot use status after the range on created_at.

Index Skip Scan

Some databases can "skip scan" an index even without the leading column, jumping between distinct values of the first column. This works when the first column has few distinct values. But it is slower than a properly-ordered index.

💡 Key Takeaways
Composite index on (A, B, C) serves queries on A, (A,B), or (A,B,C), but not B alone due to leftmost prefix rule
Place most selective column first: 1% selectivity first touches 1% of pages vs 50% selectivity first
Equality columns before range columns: range condition stops index efficiency for subsequent columns
Index skip scan works around missing leading column but is slower than proper column order
📌 Interview Tips
1Leftmost prefix: index (country, city, zip) works for WHERE country = US, fails for WHERE city = NYC
2Selectivity math: 1% selectivity column A first → scan 1% of index; 50% column B first → scan 50%
3Equality before range: WHERE user_id = 123 AND ts > yesterday uses (user_id, ts), not (ts, user_id)
← Back to Indexing Strategies Overview
Why Do Composite Index Column Ordering and Selectivity Matter? | Indexing Strategies - System Overflow