问题
kafka schema management with avro give us flexibility to backward compatibility but how do we handle breaking-changes in the scheme?
Assume Producer A publish messages M to Consumer C
assume message M has a breaking change in it's scheme (e.g name field is now splitted into first_name and last_name) and we have new scheme M-New
Now we are deploying producer A-New and Consumer C-New
problem is that until our deployment process finish we can have Producer A-new publish message M-new where Consumer C (the old one) will receive the M-new and we can lose message because of that.
So the only way to do this is to sync the deployment of new producers and consumers which is adding lots of overhead
any suggestions how to handle that?
回答1:
An easy way would be to have a long retention period for your topics. Then you just create a new topic for the breaking changes. All consumers can move to the new topic within the retention period without losing messages.
回答2:
e.g name field is now splitted into first_name and last_name
The Avro definition of a "backwards compatible" schema could not allow you to add these new fields without 1) keeping the old name field 2) adding defaults to the new fields - https://docs.confluent.io/current/schema-registry/avro.html
If your Consumers upgrade their schema first, they see the old name field, continuing to be sent by old producers as well as interpreting the defaults for the new fields until the producers upgrade and start sending the new fields
If the producers upgrade first, then consumers will never see the new fields, so the producers should still send out the name field, or opt to send some garbage value that'll start intentionally breaking consumers (e.g. make the field nullable to begin with but never actually send a null, then start sending a null, while consumers assume it cannot be null)
In either case, I feel like your record processing logic has to detect which fields are available and not null or their default values.
But, compare that to JSON or any plain string (like CSV), and you have no guarantees of what fields should be there, if they're nullable, or what types they are (is a date a string or a long?), thus you can't guarantee what objects your clients will internally map messages into for processing... That's a larger advantage of Avro I find than compatibility rules
Personally, I find enforcing FULL_TRANSITIVE compatibility on the registry works best when you have little to no communication between your Kafka users
来源:https://stackoverflow.com/questions/56432184/how-to-use-kafka-schema-management-and-avro-for-breaking-changes