Networking & ProtocolsTLS/SSL & EncryptionHard⏱️ ~3 min

TLS Failure Modes: Revocation, Zero RTT Replay, and Operational Pitfalls

Revocation Limitations

Certificate revocation is one of the most brittle aspects of TLS at Internet scale. When a certificate is compromised, the CA should add it to a CRL (Certificate Revocation List) or make it queryable via OCSP (Online Certificate Status Protocol). In theory, clients check these before trusting certificates. In practice, CRLs can grow to megabytes and propagate slowly, while OCSP queries add 50 to 200ms latency per connection.

Most browsers implement soft fail for revocation checks: if the OCSP responder is unreachable, the client proceeds anyway rather than blocking connectivity. This makes revocation unreliable for timely response to compromise. OCSP stapling (where the server fetches its own OCSP response and includes it in the handshake) improves performance but still has variable success rates. The operational reality is to rely on short certificate lifetimes (90 days or less) rather than revocation, accepting that a compromised key might be usable until natural expiration but limiting that window.

Zero RTT Replay Risk

TLS 1.3 Zero RTT (0-RTT) early data allows clients to send application data in the very first flight of a resumed handshake, eliminating all handshake latency for returning connections. However, this data is fundamentally replayable: an attacker who captures the initial client message can resend it to the same or different servers. The TLS layer provides no replay protection for early data.

This restricts 0-RTT to idempotent operations only (operations that produce the same result no matter how many times they execute). A GET request for a static asset is idempotent: replaying it just fetches the same file again. A POST request creating a database record is not idempotent: replaying it could create duplicates. Production systems often disable 0-RTT entirely for API endpoints, enabling it only for static content. Applications using 0-RTT for non idempotent operations must implement application level replay protection using single use tokens or timestamps with strict bounds.

Session Ticket Key Management

Session ticket keys must rotate frequently to maintain forward secrecy (the property that compromising current keys cannot decrypt past sessions). If ticket encryption keys are not rotated within hours to one day, an attacker who steals an old key can decrypt all sessions tied to tickets encrypted with that key. Forward secrecy from ephemeral key exchange during the original handshake is undermined if ticket keys persist too long.

Ticket keys must also synchronize across load balanced servers. If a user connects to server A, receives a ticket encrypted with key K1, then returns and hits server B which only has key K2, server B cannot decrypt the ticket. Resumption fails, triggering a full handshake with higher latency and CPU cost. Monitoring resumption rates reveals synchronization failures: sudden drops below 50 percent typically indicate key distribution problems.

Certificate Chain and MTU Issues

Incomplete certificate chains (missing intermediate certificates) cause validation failures for clients that have not cached the missing intermediate. Some clients succeed, others fail, creating intermittent and hard to diagnose issues. Always serve the complete chain from end entity certificate through all intermediates to the root.

Large certificate chains create packet fragmentation issues. RSA chains at 4 to 6 KB exceed typical 1500 byte MTU (Maximum Transmission Unit) and fragment into 3 to 4 packets. On lossy mobile networks, more packets means more chances for loss. ECDSA chains at 2 to 3 KB fit in 2 packets, reducing handshake failure rates by 5 to 15 percent on congested networks.

💡 Key Takeaways
OCSP queries add 50 to 200ms latency and fail open in most browsers, making revocation unreliable; short 90 day lifetimes limit exposure instead
Zero RTT early data is replayable at the TLS layer; safe only for idempotent operations like GET requests, never for writes without application level replay protection
Session ticket keys not rotated within hours to 1 day undermine forward secrecy; leaked keys allow decryption of all past sessions encrypted with those keys
Ticket keys must synchronize across load balanced servers; resumption failures indicate synchronization problems, spiking latency and CPU cost
Incomplete certificate chains cause intermittent validation failures in 10 to 30 percent of clients depending on cached intermediates
RSA chains (4 to 6 KB) fragment into 3 to 4 packets versus 2 for ECDSA (2 to 3 KB), increasing failure rates by 5 to 15 percent on lossy networks
📌 Interview Tips
1Explain why revocation does not work at scale: OCSP adds latency, most browsers proceed anyway on failure, so compromised certificates remain trusted
2Discuss 0-RTT risk: replayed POST could duplicate transactions, so restrict to static content or implement application level single use tokens
3Mention session ticket monitoring: sudden resumption rate drops from 80 percent to 20 percent indicate key synchronization failure across servers
← Back to TLS/SSL & Encryption Overview
TLS Failure Modes: Revocation, Zero RTT Replay, and Operational Pitfalls | TLS/SSL & Encryption - System Overflow