Resilience & Service PatternsService DiscoveryHard⏱️ ~2 min

Registry Consistency: AP vs CP Trade-offs in Service Discovery

The CAP Theorem Context

Service registries are distributed systems subject to CAP constraints: during network partitions, they must choose between consistency (all nodes see the same data) and availability (all requests receive responses). This choice fundamentally affects discovery behavior during failures.

AP Registry Behavior

AP registries (Availability over Consistency) prioritize responding to queries even with stale data. During a partition, both sides continue serving requests independently. Clients might see different instance lists depending on which registry node they query. When the partition heals, registries reconcile differences. Benefit: services remain discoverable during outages. Risk: clients may route to instances that no longer exist.

CP Registry Behavior

CP registries (Consistency over Availability) refuse to serve queries if they cannot guarantee data accuracy. During a partition, nodes that cannot reach a quorum (majority of nodes) stop accepting reads and writes. Benefit: clients always see accurate data. Risk: discovery becomes unavailable during partitions, potentially blocking all service communication.

⚠️ Key Trade-off: AP registries may return stale data but remain available. CP registries guarantee accuracy but may become unavailable. For service discovery, availability usually matters more: stale routes with client retries are better than no routes at all.

Eventual Consistency Windows

AP registries converge to consistency after partitions heal. The convergence window depends on replication lag and reconciliation speed. During this window, different clients see different views. Design clients to tolerate this: use retries, circuit breakers, and fallback to cached data. A 5 second stale cache is better than failing requests.

Practical Recommendations

Most production deployments use AP registries because discovery unavailability is catastrophic. Client side caching provides a safety net: if the registry is unreachable, use the last known good data. Health checks on the client side detect failed instances. The combination of AP registry plus client caching plus health checks provides resilient discovery.

💡 Key Takeaways
AP registries stay available during partitions but may return stale data; clients might route to dead instances
CP registries guarantee accuracy but become unavailable during partitions, potentially blocking all communication
For discovery, AP plus client caching plus health checks is preferred: stale routes with retries beat no routes
📌 Interview Tips
1Frame the choice clearly: AP means sometimes stale, CP means sometimes unavailable. Discovery usually needs AP.
2Explain client side mitigation: cache last known good data, retry on failure, circuit break on repeated failures
3Note that CP registry unavailability is catastrophic: if discovery fails, no service can find any other service
← Back to Service Discovery Overview
Registry Consistency: AP vs CP Trade-offs in Service Discovery | Service Discovery - System Overflow