How to use kafka schema management and Avro for breaking changes

问题

kafka schema management with avro give us flexibility to backward compatibility but how do we handle breaking-changes in the scheme?

Assume Producer A publish messages M to Consumer C

assume message M has a breaking change in it's scheme (e.g name field is now splitted into first_name and last_name) and we have new scheme M-New

Now we are deploying producer A-New and Consumer C-New

problem is that until our deployment process finish we can have Producer A-new publish message M-new where Consumer C (the old one) will receive the M-new and we can lose message because of that.

So the only way to do this is to sync the deployment of new producers and consumers which is adding lots of overhead

any suggestions how to handle that?

回答1:

An easy way would be to have a long retention period for your topics. Then you just create a new topic for the breaking changes. All consumers can move to the new topic within the retention period without losing messages.

回答2:

e.g name field is now splitted into first_name and last_name

The Avro definition of a "backwards compatible" schema could not allow you to add these new fields without 1) keeping the old name field 2) adding defaults to the new fields - https://docs.confluent.io/current/schema-registry/avro.html

If your Consumers upgrade their schema first, they see the old name field, continuing to be sent by old producers as well as interpreting the defaults for the new fields until the producers upgrade and start sending the new fields

If the producers upgrade first, then consumers will never see the new fields, so the producers should still send out the name field, or opt to send some garbage value that'll start intentionally breaking consumers (e.g. make the field nullable to begin with but never actually send a null, then start sending a null, while consumers assume it cannot be null)

In either case, I feel like your record processing logic has to detect which fields are available and not null or their default values.

But, compare that to JSON or any plain string (like CSV), and you have no guarantees of what fields should be there, if they're nullable, or what types they are (is a date a string or a long?), thus you can't guarantee what objects your clients will internally map messages into for processing... That's a larger advantage of Avro I find than compatibility rules

Personally, I find enforcing FULL_TRANSITIVE compatibility on the registry works best when you have little to no communication between your Kafka users

来源：https://stackoverflow.com/questions/56432184/how-to-use-kafka-schema-management-and-avro-for-breaking-changes

标签

java

apache-kafka

avro