Fraud Detection & Anomaly DetectionFeature Engineering (Temporal Patterns, Aggregations, Velocity)Medium⏱️ ~2 min

Aggregations Over Windows: Summarizing Temporal Behavior

Aggregations compress event history into signals that capture trend, volatility, and summary statistics over time windows. Common aggregations include moving averages, rolling standard deviation, counts, sums, min or max, quantiles, and distinct counts. These reduce noise from individual events and expose patterns like rising transaction volume or increasing price volatility. Window types determine how history is segmented. Sliding windows move continuously with each event, updating the aggregate for the last N seconds or events. Tumbling windows partition time into non overlapping buckets like 5 minute intervals. Session windows group events by activity gaps, useful for user sessions that end after 30 minutes of inactivity. The choice affects latency, state size, and semantics. Sliding windows are most responsive but require more state because each event can belong to multiple overlapping windows. Window length creates a fundamental tradeoff. Short windows like 1 minute or 5 minutes respond quickly to changes but amplify noise and are vulnerable to random spikes. A single retry burst can trigger false alarms. Long windows like 24 hours or 7 days are stable and smooth but slow to detect real shifts. Production systems use multiple window lengths and let the model learn optimal weighting. Stripe maintains counts per card in 1 minute, 5 minute, 1 hour, 24 hour, and 7 day windows. The 1 minute window catches velocity attacks where fraudsters run 20 attempts in seconds. The 7 day window establishes baseline behavior so the model can flag deviation. Exponentially weighted averages offer an alternative that avoids storing multiple windows. They maintain a single running state updated as new_value = alpha × current + (1 minus alpha) × prior, where alpha controls decay rate. This is memory efficient and naturally emphasizes recent data, but the time constant is less interpretable than an explicit window. Use exponential decay for lightweight aggregates like rolling mean transaction amount. Use explicit windows when you need exact counts or percentiles for compliance and explainability.
💡 Key Takeaways
Common aggregations include moving average, rolling standard deviation, count, sum, min, max, quantiles, and distinct count over time windows
Sliding windows update continuously and are most responsive but require storing per window state; tumbling windows partition into non overlapping buckets and use less memory
Short windows like 1 minute respond fast but amplify noise; long windows like 7 days are stable but slow to adapt; use multiple windows and let the model weight them
Stripe tracks card transaction counts in 1 minute, 5 minute, 1 hour, 24 hour, and 7 day windows, catching both velocity attacks and long term anomalies
Exponentially weighted averages maintain single running state with new = alpha × current + (1 minus alpha) × prior, saving memory but reducing interpretability
Exact distinct counts for high cardinality keys are expensive; probabilistic sketches like HyperLogLog reduce cost with controlled error for network level signals
📌 Examples
Stripe velocity detection: if card shows 3 attempts in 1 minute but baseline is 45 in 7 days (average 6 per day), the 1 minute spike triggers fraud review
Uber demand forecasting: 5 minute rolling average of ride requests per geohash smooths noise, while 1 hour window captures rush patterns
Amazon inventory: rolling 7 day sum of unit sales with 1 day tumbling window for daily reporting, plus exponential decay for trending products that need fast reaction
PayPal merchant risk: distinct count of devices per merchant in 24 hours using HyperLogLog approximation, achieving 2% error with 1KB per merchant versus exact tracking at 100KB
← Back to Feature Engineering (Temporal Patterns, Aggregations, Velocity) Overview
Aggregations Over Windows: Summarizing Temporal Behavior | Feature Engineering (Temporal Patterns, Aggregations, Velocity) - System Overflow