Registry Consistency: AP vs CP Trade-offs in Service Discovery
The CAP Theorem Context
Service registries are distributed systems subject to CAP constraints: during network partitions, they must choose between consistency (all nodes see the same data) and availability (all requests receive responses). This choice fundamentally affects discovery behavior during failures.
AP Registry Behavior
AP registries (Availability over Consistency) prioritize responding to queries even with stale data. During a partition, both sides continue serving requests independently. Clients might see different instance lists depending on which registry node they query. When the partition heals, registries reconcile differences. Benefit: services remain discoverable during outages. Risk: clients may route to instances that no longer exist.
CP Registry Behavior
CP registries (Consistency over Availability) refuse to serve queries if they cannot guarantee data accuracy. During a partition, nodes that cannot reach a quorum (majority of nodes) stop accepting reads and writes. Benefit: clients always see accurate data. Risk: discovery becomes unavailable during partitions, potentially blocking all service communication.
Eventual Consistency Windows
AP registries converge to consistency after partitions heal. The convergence window depends on replication lag and reconciliation speed. During this window, different clients see different views. Design clients to tolerate this: use retries, circuit breakers, and fallback to cached data. A 5 second stale cache is better than failing requests.
Practical Recommendations
Most production deployments use AP registries because discovery unavailability is catastrophic. Client side caching provides a safety net: if the registry is unreachable, use the last known good data. Health checks on the client side detect failed instances. The combination of AP registry plus client caching plus health checks provides resilient discovery.