Data Quality & ValidationData Anomaly DetectionMedium⏱️ ~3 min

Choosing Detection Strategies: When to Use What

The Core Decision: Selecting between rule based and model based detection depends on your data characteristics, team capabilities, and tolerance for false positives versus false negatives.
Rule Based
Predictable, explainable, needs manual tuning
vs
Model Based
Adaptive, complex, handles growth automatically
Use Rule Based When: Your data has stable, well understood bounds. For example, user_id null ratio should always be under 0.5% regardless of volume growth. A percentage value cannot exceed 100%. Geographic data should only contain known country codes. These constraints are business logic, not statistical patterns. Rules are also better when you need perfect explainability for compliance or when your team lacks ML expertise to maintain model based systems. Rule based systems work well for critical invariants where any violation is definitely wrong. A financial pipeline where transaction amounts must be positive, or an inventory system where stock levels cannot be negative. False positives are acceptable because every alert represents a real constraint violation. Use Model Based When: Your data has growth trends or seasonality. Daily active users growing 10% month over month will quickly outgrow static thresholds. Retail traffic that spikes 10x during holidays needs adaptive baselines. Model based detection shines when normal behavior changes over time but anomalies are still deviations from the trend. Models handle high dimensional metrics better. If you monitor 1,000 tables with 10 metrics each, maintaining 10,000 manual rules is operationally impossible. A model trained on historical patterns can cover all metrics with shared infrastructure.
"Choose rules when any violation is definitely wrong. Choose models when you need to detect deviations from evolving normal behavior."
Hybrid Approach in Practice: Most mature systems combine both. Run rules for hard constraints (schema validation, null ratios on critical fields) alongside models for volume and distribution metrics (row counts, value ranges). Salesforce uses a rule engine for conditions needing clear explanations, plus an ML model service for complex patterns. They batch model requests and use horizontal scaling to handle load. The Streaming Decision: Choose streaming detection when you need operational response times (under 5 minutes) for high value use cases like fraud detection, SLA monitoring, or systems where bad data causes immediate user impact. The cost is 3x to 5x higher infrastructure spend and operational complexity. Choose batch detection when you can tolerate detection latency of one batch interval (typically 15 minutes to 1 hour). Most analytics pipelines fall here. A corrupt daily report detected in 1 hour is acceptable, and batch detection is simpler and cheaper. Cold Start and Growth: Model based systems need warmup. AWS Glue requires at least 3 runs, but 30 to 90 days of history for reliable detection. During cold start or after major changes, expect higher false positive rates. Plan to tune sensitivity thresholds based on your tolerance. Financial systems might accept 20% false positives to catch every real issue. Analytics teams might tune for 5% false positives to reduce alert fatigue.
💡 Key Takeaways
Rule based detection works for stable constraints (null ratios under 0.5%, values within fixed ranges) and when perfect explainability is required for compliance
Model based detection handles growth trends (10% monthly increase), seasonality (10x holiday traffic), and high dimensional metrics (1,000s of tables) that make manual rules impossible
Hybrid systems combine rules for hard constraints with models for volume and distribution patterns, as implemented by companies like Salesforce
Streaming detection costs 3x to 5x more but provides under 5 minute alerts for high value use cases like fraud or SLA monitoring, while batch detection suits most analytics with 15 minute to 1 hour tolerance
Model based systems need 30 to 90 days warmup and higher false positive rates during cold start, requiring sensitivity tuning based on team tolerance (5% for analytics, 20% for financial systems)
📌 Examples
1Financial pipeline uses rules: transaction amounts must be positive, account balances cannot go negative. Any violation is definitely an error requiring immediate investigation.
2Retail analytics uses models: daily sales grow 8% monthly and spike 10x during Black Friday. Model learns these patterns from 90 days history, avoiding false alarms during expected peaks while catching real anomalies.
3Hybrid system: Rules check that <code>user_id</code> is never more than 1% null (hard constraint), while models detect unexpected 20% drop in daily signups (trend deviation).
4Streaming fraud detection justifies 5x cost with under 2 minute alerts preventing account takeovers. Batch analytics detection runs hourly, catching corrupt reports with 1 hour latency at 1/5 the infrastructure cost.
← Back to Data Anomaly Detection Overview