Guardrail Metric Selection and Tiering
Selection Criteria
Good guardrails are: (1) causally connected to user/business value, (2) measurable with low latency, (3) sensitive enough to detect meaningful harm, (4) specific enough to not fire on noise. Avoid proxy metrics that correlate with value but arent causal - they create Goodharts law problems.
Setting Thresholds
Threshold too tight (1% degradation blocks) catches real issues but also blocks 30-50% of experiments due to noise. Threshold too loose (10% degradation blocks) misses subtle but cumulative harm. Start with business-derived thresholds: what degradation would you actually reject?
Common approach: set threshold at 2-5% relative degradation with 90-95% confidence. This balances catching real harm against velocity. Adjust based on metric sensitivity and business importance.
Tiering by Importance
Tier guardrails by consequence: Tier 1 (hard blockers - experiment cannot ship), Tier 2 (soft warnings - requires justification to ship), Tier 3 (informational - tracked but not enforced). This lets teams move fast on lower-risk changes.