I want to use Avro to serialize the data for my Kafka messages and would like to use it with an Avro schema repository so I don\'t have to include the schema with every message.
The schema id is actually encoded in the avro message itself. Take a look at this to see how encoders/decoders are implemented.
In general what's happening when you send an Avro message to Kafka:
0x0
byte which is used to distinguish that kind of messages, schema id is a 4 byte integer value the rest is the actual encoded message.When you decode the message back here's what happens:
0x0
.If your key is Avro encoded then your key will be of the format described above. The same applies for value. This way your key and value may be both Avro values and use different schemas.
Edit to answer the question in comment:
The actual schema is stored in the schema repository (that is the whole point of schema repository actually - to store schemas :)). The Avro Object Container Files format has nothing to do with the format described above. KafkaAvroEncoder/Decoder use slightly different message format (but the actual messages are encoded exactly the same way sure).
The main difference between these formats is that Object Container Files carry the actual schema and may contain multiple messages corresponding to that schema, whereas the format described above carries only the schema id and exactly one message corresponding to that schema.
Passing object-container-file-encoded messages around would probably be not obvious to follow/maintain because one Kafka message would then contain multiple Avro messages. Or you could ensure that one Kafka message contains only one Avro message but that would result in carrying schema with each message.
Avro schemas can be quite large (I've seen schemas like 600 KB and more) and carrying the schema with each message would be really costly and wasteful so that is where schema repository kicks in - the schema is fetched only once and gets cached locally and all other lookups are just map lookups that are fast.