Threshold Tuning and Cost Sensitive Decision Making
From Scores to Decisions
Fraud models output probability scores between 0 and 1. Business value comes from converting scores into actions through thresholds. The key insight: different errors have vastly different costs.
Missing a ,000 fraud costs ,000 plus a chargeback fee. Blocking a legitimate transaction costs customer frustration and potential churn. Sending a transaction to human review costs -5 in analyst time. Optimal thresholds balance these costs.
Multi-Threshold Approach
Rather than a single approve/decline threshold, production systems use multiple thresholds creating decision bands. Below 0.05: auto-approve. 0.05 to 0.30: approve but flag for post-transaction review. 0.30 to 0.70: route to human analyst for real-time decision. Above 0.70: auto-decline.
Each band has different cost structures. Auto-decisions cost nothing in labor. Human review costs -5 per transaction but catches errors before they become chargebacks. The middle band width depends on analyst capacity and transaction volume.
Dynamic Thresholds
Optimal thresholds vary by context. High-value transactions warrant more caution: lower the auto-approve threshold. New accounts without history need stricter thresholds. Peak traffic periods might raise auto-approve threshold to maintain analyst queue depth.
Calibration Matters
Threshold math assumes calibrated probabilities. If the model says 10% fraud probability, 10% of those transactions should actually be fraud. Uncalibrated models break threshold logic. Validate calibration with reliability diagrams plotting predicted vs actual fraud rates in score buckets.