Load BalancingHealth Checks & Failure DetectionMedium⏱️ ~3 min

Health Check Layers: Liveness, Readiness, and Capacity Signals

Definition
Health checks are probes that determine whether a server can accept and process requests. Mature systems distinguish three layers: liveness (is the process running), readiness (can it serve traffic), and capacity (how much traffic should it receive). Health is not binary; a process can accept TCP connections yet be unable to do useful work.

Liveness Checks

Liveness checks answer: is the process running at all? They should be narrow and conservative, checking only for fatal wedged states like a hung event loop or deadlocked thread pool. A typical liveness check verifies an internal watchdog tick or thread pool progress, nothing more. Critically, liveness checks must never verify external dependencies. If the database is down and liveness fails, the orchestrator restarts all application instances, making recovery harder. Liveness should only restart truly broken processes that cannot recover without a restart.

Readiness Checks

Readiness checks determine: can this instance serve production traffic at expected QoS (Quality of Service) right now? This is what load balancers use to include or exclude instances from rotation. Readiness should reflect dependency availability, queue depth, and tail latency, returning HTTP 503 when temporarily unable to meet SLOs (Service Level Objectives). Load balancers commonly probe every 5-30 seconds, marking targets unhealthy after 2-5 consecutive failures. At 10-second intervals with 2 failure threshold, detection takes roughly 20-25 seconds.

Capacity Signals

Capacity signals communicate how much traffic an instance should receive, rather than just on/off binary status. Agent checks let applications advertise dynamic weights from 0-100% or maximum connection limits. During partial degradation, an instance can signal 75% weight instead of fully removing itself. This prevents the binary flapping problem where instances oscillate between fully in and fully out of rotation under load pressure. Gradual weight reduction under stress maintains higher aggregate throughput than abrupt removal.

HTTP Response Codes

Return HTTP 200 for healthy, HTTP 503 (Service Unavailable) for temporary unavailability. The 503 signals load balancers to retry elsewhere while keeping the instance in rotation for recovery. Returning 200 during degraded states prevents automatic traffic shifting and violates the health check contract. Some systems also use HTTP 429 (Too Many Requests) to signal capacity limits without indicating unhealthiness.

Key Trade-off: Deep readiness checks that verify database and cache dependencies catch real problems but risk becoming a self-inflicted DDoS. Cache health results for 5-30 seconds and add jitter to check schedules when running thousands of instances.
💡 Key Takeaways
Liveness checks only for fatal process states (hung event loop, deadlocked threads); never check dependencies to avoid cascading restarts
Readiness reflects ability to meet SLOs now; load balancers probe every 5-30s with 2-5 failure threshold for 20-25s detection
Capacity signals with 0-100% weights enable gradual degradation, preventing binary flapping at load boundaries
Return HTTP 503 for temporary unavailability to trigger load balancer retry; 200 during degradation breaks the health contract
📌 Interview Tips
1Explain the three-layer model: liveness (process running), readiness (can serve traffic), capacity (how much traffic)
2Calculate detection time: interval times threshold plus timeout (10s interval, 2 failures = ~20-25s detection)
3Mention that checking dependencies in liveness causes cascading restarts during outages, amplifying problems
← Back to Health Checks & Failure Detection Overview
Health Check Layers: Liveness, Readiness, and Capacity Signals | Health Checks & Failure Detection - System Overflow