Fraud Detection & Anomaly DetectionFeature Engineering (Temporal Patterns, Aggregations, Velocity)Hard⏱️ ~3 min

Online and Offline Feature Computation Architecture

Production temporal feature systems must compute features online for low latency inference and offline for training, using a shared definition to prevent training serving skew. The architecture has three layers: stream processing for short windows, batch jobs for long windows, and a feature store that unifies reads. Online path uses stream processors with event time semantics. Kafka Streams, Flink, or Spark Structured Streaming maintain per key state for windows like 1 minute, 5 minutes, and 1 hour. For sliding windows, store time bucketed counters in a ring buffer. When a new event arrives, increment the bucket for its timestamp and evict buckets older than the window. For 5 minute windows with 10 second buckets, keep 30 buckets per key. Deduplicate using idempotency keys to handle retries without inflating counts. Use watermarks to tolerate late arrivals, accepting events up to 5 minutes late at p99. This increases accuracy but delays window closure. Stripe style systems target 2 to 5 millisecond p95 feature read latency from Redis or similar in memory stores, with p99 under 10 milliseconds. Offline path computes longer windows and seasonality features in batch, running hourly or daily. It simulates the same logic as online using event time to ensure consistency. Point in time joins are critical: for each training example at timestamp T, use only feature values computed from events before T. This prevents label leakage where future information contaminates training. Backfill short windows offline by replaying events through the same windowing logic. Store results in a feature store with timestamp indexed access. Uber style systems maintain 8 week seasonal profiles and holiday calendars offline, joining them with online 5 minute demand and supply velocity at inference time. Hybrid storage optimizes cost and latency. Keep 1 minute and 5 minute windows fully in memory with sub 10 millisecond reads. Store 1 hour windows in a fast persistent store like DynamoDB or Bigtable with 20 to 50 millisecond p99. Snapshot 24 hour and 7 day aggregates from batch to the online store every hour, trading freshness for cost. This three tier design handles Stripe and PayPal scale: 100K queries per second for real time decisions, 10 million active entities, and sub 50 millisecond end to end latency including feature fetch and model inference. Monitoring prevents silent failures. Track feature freshness as the age of the last update per key, alerting if it exceeds 2x expected window size. Monitor null rates: if distinct device count per merchant is suddenly null for 10% of keys, upstream ingestion failed. Measure distribution drift between online and offline: if offline mean transaction amount is $150 but online shows $200, timestamp logic or filtering differs. Maintain golden tests that replay a fixed event timeline through both pipelines and assert exact feature equality. At Amazon scale, automated tests run hourly on 1% of traffic, catching skew before it degrades model accuracy by more than 0.5%.
💡 Key Takeaways
Stream processors maintain per key state for short windows using time bucketed ring buffers; Stripe keeps 30 buckets of 10 seconds each for 5 minute sliding windows
Watermarks tolerate late events up to 5 minutes at p99, increasing accuracy but delaying window closure; deduplicate with idempotency keys to avoid retry inflation
Point in time joins for training use only features computed from events before label timestamp T, preventing label leakage that inflates offline metrics
Hybrid storage keeps 1 minute and 5 minute windows in memory for sub 10ms reads, 1 hour in fast persistent store at 20 to 50ms p99, and 24 hour snapshots refreshed hourly
Monitor feature freshness (age of last update), null rates (upstream failures), and distribution drift (online versus offline mean); alert if freshness exceeds 2x expected window
Golden tests replay fixed event timeline through online and offline paths, asserting exact feature equality; Amazon runs hourly on 1% of traffic to catch skew within hours
📌 Examples
Stripe real time fraud: stream processor updates Redis with card velocity counts in 1 minute and 5 minute windows, achieving p95 read latency of 3ms for 100K QPS inference load
Uber demand forecasting: online path computes 5 minute ride request counts per geohash, offline path builds 8 week seasonal profiles, feature store merges at inference time
PayPal training pipeline: point in time join at transaction timestamp T fetches card count in prior 24 hours, merchant count in prior 7 days, ensuring no future leakage into training data
Amazon feature skew detection: golden test replays 1000 test transactions through both pipelines, finds online average is $200 but offline is $150, reveals timezone bug in batch job
← Back to Feature Engineering (Temporal Patterns, Aggregations, Velocity) Overview
Online and Offline Feature Computation Architecture | Feature Engineering (Temporal Patterns, Aggregations, Velocity) - System Overflow