Guardrail Failure Modes and Mitigation Strategies
False Negatives: Missing Real Harm
Guardrail doesnt fire when treatment actually harms users. Causes: threshold too loose, metric not sensitive enough, delay too long (harm compounds before detection), wrong metric (measuring proxy instead of true outcome). Cost: user harm ships to production.
Mitigation: tighten thresholds, add more sensitive metrics, shorten detection windows, validate guardrails against known-bad experiments retrospectively.
False Positives: Blocking Good Experiments
Guardrail fires when treatment is actually fine. Causes: threshold too tight, high metric variance, multiple testing without correction, outliers in small samples. Cost: good features delayed or abandoned, team loses trust in guardrail system.
System Failures
Pipeline failures: logging gaps, aggregation bugs, comparison errors. Detection: run guardrails on A/A experiments (should never fire). Monitoring: track guardrail fire rate over time - sudden changes indicate system issues, not treatment effects.