Fraud Detection & Anomaly DetectionFeature Engineering (Temporal Patterns, Aggregations, Velocity)Medium⏱️ ~2 min

Aggregations Over Windows: Summarizing Temporal Behavior

Window-Based Aggregations

Aggregations summarize behavior over time windows: count of transactions in last hour, sum of amounts in last 24 hours, average transaction size in last 7 days. These features compress temporal sequences into single values the model can process. Multiple window sizes capture different behavioral scales—hourly bursts versus weekly patterns.

Common Aggregations: COUNT (frequency), SUM (volume), AVG (typical size), MAX (largest event), MIN (smallest event), STDDEV (variability), DISTINCT_COUNT (diversity of merchants/devices).

Window Size Selection

Short windows (1 hour, 1 day) capture recent behavioral spikes. Long windows (30 days, 90 days) capture baseline behavior. Use both: compare current velocity to historical baseline. A user with 10 transactions in the last hour is suspicious if their 30-day average is 2 per day, but normal if their average is 50 per day.

Sliding vs Tumbling Windows

Sliding windows update continuously—the 1-hour count changes with every new event. Tumbling windows reset at fixed intervals—the daily count resets at midnight. Sliding windows provide smoother signals but require more computation. Tumbling windows are cheaper but create boundary artifacts (behavior at 11:59 PM is disconnected from 12:01 AM).

Implementation Tip: Pre-compute tumbling window aggregations in batch pipelines (hourly, daily). Compute sliding window approximations in real-time by combining tumbling windows with partial current-window counts.

Group-By Dimensions

Aggregate by different dimensions: per-user, per-device, per-merchant, per-IP address. A user with normal overall velocity but 5 transactions to the same merchant in 10 minutes shows suspicious per-merchant concentration. Multi-dimensional aggregations reveal patterns invisible in single-dimension views.

💡 Key Takeaways
Multiple window sizes capture different scales: short windows (1hr, 1d) for spikes, long windows (30d, 90d) for baseline comparison
Sliding windows update continuously but cost more; tumbling windows are cheaper but create boundary artifacts
Aggregate by multiple dimensions (user, device, merchant, IP) to reveal patterns invisible in single-dimension views
📌 Interview Tips
1Common aggregations: COUNT (frequency), SUM (volume), AVG (typical size), STDDEV (variability), DISTINCT_COUNT (diversity)
2Compare current to baseline: 10 transactions in last hour is suspicious if 30-day average is 2/day, normal if 50/day
← Back to Feature Engineering (Temporal Patterns, Aggregations, Velocity) Overview
Aggregations Over Windows: Summarizing Temporal Behavior | Feature Engineering (Temporal Patterns, Aggregations, Velocity) - System Overflow