Time Series ForecastingReal-time Updates (Online Learning, Sliding Windows)Hard⏱️ ~3 min

End to End Architecture for Real Time Features at Scale

End-to-End Architecture: Production real-time feature systems combine event ingestion, stream processing, state management, and low-latency serving. Each layer introduces latency and failure modes; the architecture must balance freshness, accuracy, and reliability.

Ingestion Layer

Events flow from applications to a message queue (Kafka, Kinesis, Pub/Sub). The queue provides durability and decouples producers from consumers. Key decisions: partition strategy (by user_id for user features, by item_id for item features), retention period (determines how far back you can replay), and replication factor (durability vs cost). For ML features, partition by the entity you are aggregating over—this ensures all events for one entity go to the same consumer, enabling stateful processing.

Stream Processing Layer

A stream processor (Flink, Spark Streaming, Kafka Streams) consumes events, computes windowed aggregations, and writes results. Checkpointing provides exactly-once semantics by periodically saving state to durable storage. If a node fails, processing resumes from the last checkpoint. Checkpoint interval trades latency for durability: 30-second checkpoints mean up to 30 seconds of reprocessing after failure. For ML features, checkpointing is essential—without it, node failures cause feature values to reset to zero mid-window.

State Store and Serving

Computed features must be queryable at prediction time with sub-10ms latency. Options: embedded state (RocksDB within Flink), external store (Redis, DynamoDB), or hybrid. Embedded state has lowest latency but limits scalability; external stores scale independently but add network round-trips. Common pattern: stream processor writes to Redis sorted sets (one key per entity, score by timestamp), serving layer reads latest value. TTL on keys prevents unbounded growth from inactive entities.

Latency Budget: Typical breakdown for 50ms end-to-end: ingestion queue 5ms, stream processing 20ms, state write 10ms, serving read 5ms, network overhead 10ms. Profile each component to identify bottlenecks.

💡 Key Takeaways
Partition events by the entity being aggregated for stateful processing
Checkpointing provides exactly-once semantics but adds recovery latency
External state stores scale independently but add network latency
📌 Interview Tips
150ms latency budget: ingestion 5ms, processing 20ms, write 10ms, read 5ms, network 10ms
2Redis sorted sets with TTL for feature serving and automatic cleanup
← Back to Real-time Updates (Online Learning, Sliding Windows) Overview