Learn→Fraud Detection & Anomaly Detection→Real-time Scoring (Low-latency Inference)→5 of 6

Fraud Detection & Anomaly Detection • Real-time Scoring (Low-latency Inference)Hard⏱️ ~3 min

Online Feature Store Architecture for Sub 10ms Reads

Why Dedicated Online Stores
General-purpose databases add latency: query parsing, transaction overhead, index traversal. Online feature stores optimize for a single access pattern: given an entity key, return all features for that entity. This specialization enables sub-10ms reads even with hundreds of features per entity.
Architecture Pattern: The online store is a read-optimized cache populated by batch or streaming pipelines. Features are pre-computed offline and written to the store. At serving time, only key-value lookups occur—no computation, no joins, no aggregation.
Storage Layer Options
Redis provides sub-millisecond reads with in-memory storage. DynamoDB offers durability with single-digit millisecond latency. Cassandra scales to billions of keys with tunable consistency. The choice depends on data volume, durability requirements, and cost tolerance. Most systems use Redis for hot data with a persistent backing store.
Multi-Get Optimization
Fetching 100 features with 100 individual requests takes 50-100ms (network round-trips dominate). Multi-get fetches all features in a single round-trip: 1-5ms total. The client sends a list of keys; the store returns all values together. This optimization is critical for latency.
Production Tip: Colocate all features for an entity in a single key-value pair (serialized blob). This guarantees single-key retrieval regardless of feature count, eliminating multi-get overhead entirely.
Feature Freshness
Pre-computed features become stale. User activity in the last minute is not reflected in hourly-updated features. Solutions: streaming pipelines for near-real-time updates (seconds of delay), or hybrid approaches combining pre-computed baseline features with real-time computed recent activity signals.

💡 Key Takeaways

✓Online stores optimize for a single pattern: given entity key, return all features—no computation, joins, or aggregation at read time

✓Multi-get fetches all features in single round-trip (1-5ms) versus 100 individual requests (50-100ms)

✓Colocate features in a single key-value pair (serialized blob) to guarantee single-key retrieval regardless of feature count

📌 Interview Tips

1Explain storage options: Redis for sub-ms in-memory reads, DynamoDB for single-digit ms with durability, Cassandra for billions of keys

2Describe feature freshness: streaming pipelines for near-real-time updates, or hybrid with pre-computed baseline plus real-time signals

← Back to Real-time Scoring (Low-latency Inference) Overview