Learn→Fraud Detection & Anomaly Detection→Feature Engineering (Temporal Patterns, Aggregations, Velocity)→4 of 6

Fraud Detection & Anomaly Detection • Feature Engineering (Temporal Patterns, Aggregations, Velocity)Hard⏱️ ~3 min

Online and Offline Feature Computation Architecture

The Two-Pipeline Architecture
Temporal features require different computation strategies based on freshness requirements. Offline pipelines (batch) compute features over large windows (30-day averages, historical aggregations) on a schedule. Online pipelines (streaming) compute features in real-time from event streams. The feature store serves both, providing unified access regardless of computation path.
Architecture Pattern: Batch pipeline computes historical baselines nightly. Streaming pipeline updates short-window aggregations (hourly counts) in real-time. Serving layer joins both at inference time.
Offline Pipeline
Batch jobs run on data warehouses (Spark, BigQuery) processing full historical data. Compute 30-day averages, 90-day maximums, lifetime statistics. Output writes to feature store. Runs daily or hourly depending on freshness needs. Advantages: full data access, complex computations, no latency constraints. Disadvantages: features are stale until next batch run.
Online Pipeline
Streaming jobs process events in real-time (Kafka, Flink, Spark Streaming). Maintain sliding window state: increment counts on new events, decrement when events age out. Advantages: features reflect activity from seconds ago. Disadvantages: limited to incremental computations, state management complexity, higher operational cost.
Hybrid Strategy: Use batch for stable baseline features (30-day average). Use streaming for recent activity (1-hour count). At inference, fetch both and compute ratios (current_hour / baseline_30d) for velocity detection.
Consistency Challenges
Ensure batch and streaming compute features identically. Code divergence causes training-serving skew. Solutions: shared feature definitions, unit tests validating equivalence, periodic reconciliation comparing batch-computed and streaming-computed values for the same time windows.

💡 Key Takeaways

✓Batch pipelines compute historical baselines (30-day averages); streaming pipelines update short-window aggregations in real-time

✓Hybrid strategy: batch for stable baselines, streaming for recent activity, compute ratios at inference time

✓Ensure batch and streaming compute features identically—code divergence causes training-serving skew

📌 Interview Tips

1Use Spark/BigQuery for batch (full data, complex computations), Kafka/Flink for streaming (incremental, real-time)

2Periodic reconciliation compares batch-computed and streaming-computed values to detect implementation drift

← Back to Feature Engineering (Temporal Patterns, Aggregations, Velocity) Overview