Failure Modes: Cache Poisoning, Thundering Herds, and Unbounded Transforms

Cache Poisoning
Cache poisoning occurs when a CDN caches the wrong content for a URL. For image optimization: client requests /image.jpg, transformation service returns WebP, CDN caches WebP. Next client does not support WebP but receives cached WebP anyway. Root cause: missing or incorrect Vary header. Fix: always set Vary: Accept on format-negotiated responses. CDN creates separate cache entries per Accept value. Verify CDN supports Vary header caching (some do not). Alternative: encode format in URL path /image.webp instead of content negotiation.
Thundering Herd on New Images
A popular new image is posted. Thousands of users request it simultaneously before any variant is cached or generated. All requests hit origin transformation service. Each triggers compute intensive transformation. Origin CPU saturates, latency spikes, requests timeout. Mitigation strategies: request coalescing (hold duplicate requests while first one transforms, then serve all from result), async generation (return placeholder, generate in background, client retries), pre warming (generate common variants immediately on upload before any requests). Request coalescing is most effective but requires coordination layer tracking in flight transformations.
⚠️ Key Trade-off: Request coalescing adds complexity and potential single point of failure. If the first request fails, all coalesced requests fail. Implement with timeout and fallback to direct processing.
Unbounded Transformations
Attacker requests /image.jpg?w=10000&h=10000. Transformation attempts to create massive image, exhausts memory, crashes. Or requests /image.jpg?w=1 then ?w=2 then ?w=3 to fill cache with millions of variants. Mitigations: allowlist dimensions (only accept predefined sizes: 100, 200, 400, 800, 1600), max dimension limits (reject any dimension above 4000px), signed URLs (transformation parameters must be cryptographically signed), rate limiting per source image. Production systems use allowlisted dimensions plus signed URLs for custom transformations.
SSRF via Image URL
Some services fetch images from user provided URLs for processing. Attacker provides internal URL: http://169.254.169.254/metadata (cloud metadata service). Service fetches URL, returns internal data. This is Server Side Request Forgery (SSRF). Mitigations: allowlist domains (only fetch from known sources), blocklist internal IP ranges, use dedicated network segment for fetching with no internal access, validate response is actually an image before processing. SSRF via image processing is common vulnerability in media pipelines.
Codec Vulnerabilities
Image and video parsers are complex and historically vulnerable. Malformed image file can trigger buffer overflow in decoder, leading to code execution. Defense: use hardened, sandboxed decoders. Run transformation in isolated containers. Keep codec libraries updated (new vulnerabilities discovered regularly). Consider preprocessing with simple validator before full decode. Rate limit transformations per source IP to limit exploitation attempts.
Runaway Encoding Costs
Video encoding is expensive. A bug or attack triggers mass re-encoding. Cloud encoding bill spikes from $1,000/day to $50,000/day. Mitigations: budget alerts at 2x and 5x normal, hard caps that stop encoding when exceeded, rate limits on encoding job submissions, monitoring for unusual encoding patterns (same video encoded repeatedly, abnormal video lengths). Investigate before lifting caps. A legitimate viral video is different from an attack or bug.

💡 Key Takeaways

✓Cache poisoning from missing Vary header serves wrong format to unsupported clients - use Vary: Accept or encode format in URL path

✓Thundering herd on new images: use request coalescing to hold duplicates, serve all from first result when complete

✓Unbounded transformations: allowlist dimensions (100, 200, 400, 800, 1600), max limits, signed URLs for custom sizes

✓SSRF via image URL: attacker provides internal URL like metadata service - use domain allowlist and blocklist internal IPs

📌 Interview Tips

1Explain cache poisoning and why Vary: Accept header is critical for format negotiation systems

2Describe request coalescing for thundering herd: hold duplicate requests, first completes generates, all receive result

3Present dimension security: allowlist common sizes plus signed URLs for custom transformations to prevent cache flooding

← Back to Image/Video Optimization & Serving Overview