Feature Store Serving Patterns and Latency Budgets
Offline vs Online Stores
Feature stores separate two paths. The offline store holds historical snapshots for training, typically in a data warehouse optimized for batch reads. It maintains point-in-time correctness to prevent leakage. The online store serves low-latency reads, backed by key-value stores optimized for fast lookups. Batch pipelines write to both stores. Streaming pipelines write incremental updates to the online store at high throughput (100K+ writes/second) to keep aggregates fresh.
Batched Retrieval Patterns
Key grouping minimizes round trips. Group all item features under one key, all user features under another. A single batch-get retrieves features for 1,000 items and 1 user in two calls instead of 150,000 calls. Result: 2-5ms total at the 99th percentile (p99 means 99% of requests are faster than this). Cache hot entities in memory to avoid repeated fetches for popular items.
Latency Budget Allocation
For a 200ms end-to-end target, budget allocation might be: feature retrieval 5ms, candidate scoring 30ms, re-ranking 10ms, network overhead 20ms, leaving margin for variance. Each stage gets a timeout. If feature fetch exceeds 10ms, degrade gracefully: reduce candidates from 1,000 to 500, or skip non-critical personalization features rather than failing the entire request.
Tail Latency Amplification
If feature store p99 spikes from 5ms to 50ms (database hiccup, network congestion), end-to-end p99 can jump past 300ms, violating user experience. Mitigations: per-stage timeouts with graceful degradation, maintain last-known-good snapshots for hot entities as fallback, replicate read-heavy data to local caches (reduces p99 from 5ms to <1ms at cost of infrastructure complexity).