Embeddings & Similarity SearchIndex Management (Building, Updating, Sharding)Easy⏱️ ~2 min

Index Families for ML Systems: Inverted vs Vector Indexes

Indexes are auxiliary data structures that trade extra storage and write work for dramatically lower read latency. In ML systems, two families dominate based on what you're searching. Inverted indexes map terms or categorical tokens to postings lists of items. Think of a book index that maps words to page numbers. For product search, you might have "laptop" mapping to [item_123, item_456, item_789]. These power sparse retrieval and are organized as immutable segments with background merges, similar to Log Structured Merge (LSM) trees. Meta and Google use these extensively for text search and filtering. Vector indexes organize high dimensional embeddings for nearest neighbor search, enabling semantic search and recommendations. A 768 dimension product embedding lives in a space where similar items cluster together. These require training quantizers or graph structures. For example, Product Quantization (PQ) compresses vectors from 3,072 bytes (768 dimensions × 4 bytes per float32) down to 16 to 32 bytes while preserving 95 to 99 percent recall. Spotify used Annoy for millions of song embeddings, while Meta uses FAISS for billion scale vector search. Composite indexes combine both families. A product search might use a vector index for semantic similarity plus an inverted index for filtering by category, price range, and availability. This hybrid approach lets you find "items similar to this blue dress under $100 in the US" efficiently.
💡 Key Takeaways
Inverted indexes map terms to item lists, enabling fast exact matching and filtering. Common for metadata like category, price, region.
Vector indexes organize embeddings in high dimensional space for semantic similarity. Required for recommendation and visual search where exact matches are insufficient.
Memory is the critical constraint. Raw float32 storage for 500 million vectors at 768 dimensions requires 1.4 terabytes. Product Quantization reduces this to 10 to 20 bytes per vector, around 5 to 10 gigabytes total.
Recall at K is the key metric. Production systems target 95 to 99 percent recall, meaning the approximate index returns 95 to 99 of the true 100 nearest neighbors.
Composite indexes combine vector search with inverted filters. You vector search first to find 10,000 candidates in 15 milliseconds, then filter by metadata to 500 results in 2 milliseconds.
📌 Examples
Meta FAISS handles billions of embeddings with Product Quantization and Inverted File (IVF) partitioning, achieving sub 50 millisecond p99 latency on CPU clusters.
Spotify Annoy indexes millions of song embeddings as memory mapped trees, supporting 5 to 10 millisecond queries for playlist generation and recommendation candidate mining.
Pinterest uses vector search for visual discovery. A 512 dimension image embedding index returns similar pins, combined with inverted indexes for board category and user filters.
← Back to Index Management (Building, Updating, Sharding) Overview
Index Families for ML Systems: Inverted vs Vector Indexes | Index Management (Building, Updating, Sharding) - System Overflow