TLS Failure Modes: Revocation, Zero RTT Replay, and Operational Pitfalls
Revocation Limitations
Certificate revocation is one of the most brittle aspects of TLS at Internet scale. When a certificate is compromised, the CA should add it to a CRL (Certificate Revocation List) or make it queryable via OCSP (Online Certificate Status Protocol). In theory, clients check these before trusting certificates. In practice, CRLs can grow to megabytes and propagate slowly, while OCSP queries add 50 to 200ms latency per connection.
Most browsers implement soft fail for revocation checks: if the OCSP responder is unreachable, the client proceeds anyway rather than blocking connectivity. This makes revocation unreliable for timely response to compromise. OCSP stapling (where the server fetches its own OCSP response and includes it in the handshake) improves performance but still has variable success rates. The operational reality is to rely on short certificate lifetimes (90 days or less) rather than revocation, accepting that a compromised key might be usable until natural expiration but limiting that window.
Zero RTT Replay Risk
TLS 1.3 Zero RTT (0-RTT) early data allows clients to send application data in the very first flight of a resumed handshake, eliminating all handshake latency for returning connections. However, this data is fundamentally replayable: an attacker who captures the initial client message can resend it to the same or different servers. The TLS layer provides no replay protection for early data.
This restricts 0-RTT to idempotent operations only (operations that produce the same result no matter how many times they execute). A GET request for a static asset is idempotent: replaying it just fetches the same file again. A POST request creating a database record is not idempotent: replaying it could create duplicates. Production systems often disable 0-RTT entirely for API endpoints, enabling it only for static content. Applications using 0-RTT for non idempotent operations must implement application level replay protection using single use tokens or timestamps with strict bounds.
Session Ticket Key Management
Session ticket keys must rotate frequently to maintain forward secrecy (the property that compromising current keys cannot decrypt past sessions). If ticket encryption keys are not rotated within hours to one day, an attacker who steals an old key can decrypt all sessions tied to tickets encrypted with that key. Forward secrecy from ephemeral key exchange during the original handshake is undermined if ticket keys persist too long.
Ticket keys must also synchronize across load balanced servers. If a user connects to server A, receives a ticket encrypted with key K1, then returns and hits server B which only has key K2, server B cannot decrypt the ticket. Resumption fails, triggering a full handshake with higher latency and CPU cost. Monitoring resumption rates reveals synchronization failures: sudden drops below 50 percent typically indicate key distribution problems.
Certificate Chain and MTU Issues
Incomplete certificate chains (missing intermediate certificates) cause validation failures for clients that have not cached the missing intermediate. Some clients succeed, others fail, creating intermittent and hard to diagnose issues. Always serve the complete chain from end entity certificate through all intermediates to the root.
Large certificate chains create packet fragmentation issues. RSA chains at 4 to 6 KB exceed typical 1500 byte MTU (Maximum Transmission Unit) and fragment into 3 to 4 packets. On lossy mobile networks, more packets means more chances for loss. ECDSA chains at 2 to 3 KB fit in 2 packets, reducing handshake failure rates by 5 to 15 percent on congested networks.