Trade-offs: Window Size, Exactness, and Feature Breadth
Window Size Trade-offs
Short windows (1 hour) respond quickly to behavioral changes but are noisy—a single unusual transaction creates a spike. Long windows (30 days) provide stable baselines but react slowly—a compromised account might drain funds before 30-day metrics shift. Use multiple window sizes: short for detection, long for baseline comparison.
Design Heuristic: Include at least three window sizes: immediate (1 hour), recent (1-7 days), historical (30+ days). Compare ratios across windows—a spike in the immediate window relative to historical is more suspicious than a spike relative to recent.
Exact vs Approximate Aggregations
Exact sliding windows require storing all events in the window—expensive for high-volume entities. Approximate methods (HyperLogLog for distinct counts, Count-Min Sketch for frequencies) trade accuracy for memory. A 5% error in distinct merchant count rarely affects fraud decisions, but saves 10x memory for hot users.
Feature Breadth vs Depth
Broad feature sets (100+ features per entity) capture many signals but increase storage, computation, and model complexity. Deep features (few features, sophisticated computation) are cheaper but may miss patterns. Start broad for discovery, then prune to features with actual predictive value. Many hand-crafted features contribute nothing after model training.
Pruning Tip: Track feature importance in production models. Features with near-zero importance can be deprecated. Reducing from 200 to 50 features often maintains accuracy while cutting feature computation cost by 75%.
Freshness vs Cost
Real-time streaming features cost 5-10x more than batch features due to infrastructure overhead. Not all features need real-time freshness. 30-day baseline features updated daily are sufficient—streaming them wastes resources. Reserve streaming computation for features where freshness matters: recent velocity, current session behavior.