End to End Architecture for Real Time Features at Scale
End-to-End Architecture: Production real-time feature systems combine event ingestion, stream processing, state management, and low-latency serving. Each layer introduces latency and failure modes; the architecture must balance freshness, accuracy, and reliability.
Ingestion Layer
Events flow from applications to a message queue (Kafka, Kinesis, Pub/Sub). The queue provides durability and decouples producers from consumers. Key decisions: partition strategy (by user_id for user features, by item_id for item features), retention period (determines how far back you can replay), and replication factor (durability vs cost). For ML features, partition by the entity you are aggregating over—this ensures all events for one entity go to the same consumer, enabling stateful processing.
Stream Processing Layer
A stream processor (Flink, Spark Streaming, Kafka Streams) consumes events, computes windowed aggregations, and writes results. Checkpointing provides exactly-once semantics by periodically saving state to durable storage. If a node fails, processing resumes from the last checkpoint. Checkpoint interval trades latency for durability: 30-second checkpoints mean up to 30 seconds of reprocessing after failure. For ML features, checkpointing is essential—without it, node failures cause feature values to reset to zero mid-window.
State Store and Serving
Computed features must be queryable at prediction time with sub-10ms latency. Options: embedded state (RocksDB within Flink), external store (Redis, DynamoDB), or hybrid. Embedded state has lowest latency but limits scalability; external stores scale independently but add network round-trips. Common pattern: stream processor writes to Redis sorted sets (one key per entity, score by timestamp), serving layer reads latest value. TTL on keys prevents unbounded growth from inactive entities.
Latency Budget: Typical breakdown for 50ms end-to-end: ingestion queue 5ms, stream processing 20ms, state write 10ms, serving read 5ms, network overhead 10ms. Profile each component to identify bottlenecks.