Latency vs Cost Trade-offs in Feature Storage

Cost Differential
Online feature stores deliver millisecond latency through high availability in memory or SSD optimized databases like Redis, DynamoDB, or Cassandra, but this performance comes at 10 to 50x higher cost per gigabyte month compared to offline object storage like S3 or data lakes. A production recommendation system might pay $50 per gigabyte month for Redis versus $1 per gigabyte month for S3, making the choice of which features live online a critical cost optimization decision.
Operational Complexity
Scales with online requirements. Multi region replication for 99.99% availability, automated failover, consistent hashing for sharding, and aggressive TTL policies to prevent unbounded growth all add engineering overhead. DoorDash reported managing 10,000+ QPS per service with burst handling and sub 10ms p99 latency requires sophisticated autoscaling, partition aware back pressure handling, and circuit breakers to fallback to default values during incidents.
Decision Framework
Centers on latency sensitivity versus feature cardinality. User facing ranking, fraud detection, and dynamic pricing need 5 to 50ms incremental latency budgets where online features materially affect CTR or conversion rates. In contrast, churn prediction, LTV modeling, and nightly batch recommendations can use offline only features since decisions occur outside request paths. Most production systems adopt a hybrid: 10 to 100 latency critical features online plus 100 to 1000 rich features precomputed offline and cached.
Cost Aware Design
Constrains online footprint through aggressive strategies. Netflix quantizes feature vectors to reduce memory, downsamples long tail entities with low request rates, and evicts stale entries via TTLs measured in hours to days. For features accessed less than once per hour per entity, the cache miss penalty of fetching from offline storage often beats the cost of maintaining online replicas across all regions.

💡 Key Takeaways

✓Online in memory storage costs 10 to 50 times more per gigabyte month than offline object storage, making feature selection a critical cost optimization ($50/GB/month Redis vs $1/GB/month S3)

✓Multi region replication and high availability infrastructure adds significant operational complexity: consistent hashing, automated failover, partition management, and back pressure handling

✓Hybrid architectures balance cost and latency by keeping 10 to 100 critical features online (real time counters, embeddings) and 100 to 1000 features offline (historical aggregates, segments)

✓Aggressive TTL policies are essential: evict entities not accessed in hours to days to prevent unbounded growth, with Netflix targeting cache hit ratios above 95% on hot entities

✓Quantization and downsampling reduce online footprint with acceptable accuracy loss: compress float32 embeddings to int8 (4x memory reduction) with less than 1% model quality degradation

✓Request budgets force prioritization: if total feature fetch must stay under 15ms p99, bundle essential features first and drop or approximate non critical features under load

📌 Interview Tips

1Meta Ads ranking: Keeps sub second freshness counters online for high impact features (click rate last hour) costing millions monthly, while batch derived audience segments stay offline and sync daily

2Uber: Maintains streaming aggregates like "trips in last 5 minutes" online with minutes of freshness, but computes complex driver behavior features offline in Spark jobs running on cheaper batch compute

3DoorDash: Serves 10,000+ QPS with p99 under 10ms by bundling top 50 features per entity into single key value lookup, falling back to cached defaults during regional outages to maintain availability

← Back to Online vs Offline Features Overview