Runtime Adaptivity and Plan Invalidation: Handling Estimation Errors
The Estimation Error Problem
Static query plans assume cardinality estimates are accurate, but production queries commonly see 10x to 1000x misestimates. Consider SELECT * FROM orders WHERE user_id = $1. For user_id = 12345 (casual user with 10 orders), nested loop is optimal. For user_id = 99 (power user with 1 million orders), hash join is required. If the plan was compiled for user 12345 and reused for user 99, performance degrades from milliseconds to minutes. This is parameter sniffing: the plan optimized for one parameter value performs terribly for another.
Adaptive Join Switching
Adaptive execution detects estimation errors at runtime and adjusts. For SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE c.segment = $1, if the optimizer expected 100 customers in segment but 100,000 materialize, the engine can switch from nested loop to hash join mid execution. The decision triggers when actual rows exceed estimated by 10x. This prevents a query planned for 100 rows from running for hours when 100,000 rows appear. Some work may replay, but total time drops from hours to seconds.
Runtime Bloom Filters
Bloom filters reduce data movement in distributed joins. For SELECT * FROM fact_sales f JOIN dim_products p ON f.product_id = p.id WHERE p.category = 'Electronics', the build side (dim_products filtered to Electronics, say 5,000 products) constructs a Bloom filter (a probabilistic structure that answers "definitely not in set" or "maybe in set" with no false negatives). The probe side (100 million fact_sales rows) checks each product_id against the Bloom filter before shuffling. If only 10 million sales match Electronics products, 90 million rows are filtered before network transfer.
Plan Invalidation
Cached plans become stale and need invalidation. Adding an index on orders.status should invalidate plans for SELECT * FROM orders WHERE status = 'pending' so they can use the new index. Schema changes (new indexes, dropped columns) trigger immediate invalidation. Performance regression triggers invalidation when a plan consistently exceeds expected cost by 5x or more. Statistics refresh after major data loads also invalidates affected plans, ensuring the optimizer reconsiders with fresh cardinality estimates.