How Avro Schema Resolution Works

The Core Mechanism: Avro's power lies in dynamic schema resolution at read time. Unlike formats where schemas are compiled into code, Avro keeps writer and reader schemas separate and reconciles them during deserialization. This enables a critical capability: old code can read new data, and new code can read old data, as long as compatibility rules are followed.

1
Producer writes: Data is serialized using the writer schema known at production time. The schema ID is embedded in a 5 byte header (1 magic byte + 4 byte integer schema ID).
2
Consumer reads: Extracts the schema ID, queries Schema Registry (caching the result locally), and retrieves the writer schema used to encode the message.
3
Resolution happens: Avro compares writer schema to reader schema field by field. Fields in writer but not reader are ignored. Fields in reader but not writer use default values. Types must match or have defined promotions.
Concrete Example: A producer using schema version 5 writes an event with fields user_id, email, and created_at. A consumer using schema version 7 expects user_id, email, and a new phone_number field with default null. Avro resolution maps the two shared fields directly and fills phone_number with null. Reading succeeds without redeploying the producer.

Performance Characteristics: Schema Registry lookups for new schema IDs typically complete in 5 to 15 milliseconds at p99. However, clients cache schemas aggressively. After the first lookup, subsequent deserializations use the cached schema, reducing overhead to microseconds per message. For steady state workloads at 50,000 messages per second, registry queries might only occur dozens of times per second during deployments or restarts.

✓ In Practice: LinkedIn's Kafka clusters handle trillions of messages daily using this pattern. Hundreds of teams evolve schemas independently. The registry ensures no team can break downstream consumers by enforcing compatibility checks at registration time.

The critical insight: this decoupling lets producers and consumers evolve at their own pace. A batch job reading three year old data uses the same resolution mechanism as a real time service reading fresh events.

💡 Key Takeaways

✓Schema resolution happens at read time by comparing writer schema (from message) to reader schema (from consumer code)

✓Registry lookups are cached client side, so p99 latency of 5 to 15 ms only affects first access to a new schema ID

✓Fields present in writer but missing in reader are silently ignored, enabling forward compatibility

✓Fields present in reader but missing in writer are filled with default values, enabling backward compatibility

✓This decoupling allows independent evolution: new code can read old data and old code can read new data within compatibility bounds

📌 Interview Tips

1A consumer deployed in 2021 with schema v10 can still read events written in 2018 using schema v3, as long as all intermediate schema changes were backward compatible

2During a schema rollout, 40% of producers use schema v8 while 60% still use v7. Consumers with cached schemas for both versions handle the mixed traffic seamlessly without registry queries.

← Back to Avro & Schema Registry Overview