Load Balancing • Health Checks & Failure DetectionMedium⏱️ ~3 min
Health Check Layers: Liveness, Readiness, and Capacity Signals
Health is not binary. A process can accept TCP connections yet be unable to do useful work due to exhausted threads, saturated queues, or dependency timeouts. Mature systems distinguish three layers, each serving a different purpose in the architecture.
Liveness checks answer "Is the process running at all?" They should be narrow and conservative, checking only for fatal wedged states like a hung event loop or deadlocked thread pool. Google production systems scope liveness to detect truly broken processes that need restarts, not transient issues. A typical liveness check might verify an internal watchdog tick or thread pool progress, nothing more.
Readiness checks determine "Can this instance serve production traffic at expected Quality of Service (QoS) right now?" This is what load balancers use to include or exclude instances from rotation. AWS Elastic Load Balancers commonly probe every 5 to 30 seconds, marking targets unhealthy after 2 to 5 consecutive failures. At 10 second intervals with 2 failure threshold, detection takes roughly 20 to 25 seconds. Readiness should reflect dependency availability, queue depth, and tail latency, returning HTTP 503 when temporarily unable to meet Service Level Objectives (SLOs).
Capacity signals communicate "How much traffic should this instance receive now?" rather than just on or off. HAProxy agent checks let applications advertise dynamic weights from 0 to 100% or max connection limits. Imgix reported sending "75% weight" during partial degradation to prevent overload. This prevents the binary flapping problem where instances oscillate between fully in and fully out of rotation under load pressure.
💡 Key Takeaways
•Liveness should be minimal and internal only. Checking dependencies in liveness causes cascading restarts during outages, amplifying the problem. Google production uses liveness narrowly for truly wedged processes.
•Readiness reflects actual ability to meet SLOs right now. AWS load balancers at 10 second intervals with 2 failure threshold detect issues in 20 to 25 seconds. Tuning to 5 second intervals reduces this to 10 to 15 seconds for tighter SLOs.
•Deep readiness checks that verify database and cache dependencies can detect real problems but risk becoming a distributed denial of service attack. Cache health results for 5 to 30 seconds and add jitter to check schedules across thousands of instances.
•Capacity aware signaling with dynamic weights from 0 to 100% prevents binary flapping at load boundaries. This allows gradual degradation instead of abrupt removal, maintaining higher aggregate throughput under stress.
•Return HTTP 503 for temporary unavailability to signal load balancers to retry elsewhere. HTTP 200 during degraded states prevents automatic traffic shifting and violates the health check contract.
📌 Examples
Google production uses white box health endpoints checking dependency latency budgets (database ping must complete within 50ms for a read service) combined with black box probers from multiple regions to avoid localized blind spots.
Netflix Eureka instances send heartbeats every 30 seconds with eviction after 90+ seconds of missed renewals. They pair this slow registry with fast client side circuit breakers that trip at over 50% errors in a 10 second window, achieving subsecond load shedding.
HAProxy agent check allows applications to report "weight 75%" or "maxconn:30" dynamically. Imgix uses this to advertise reduced capacity during garbage collection pauses or hot shard situations instead of going fully unhealthy.