I have a spring application that is my kafka producer and I was wondering why avro is the best way to go. I read about it and all it has to offer, but why can\'t I just seriali
It is a matter of speed and storage. When serializing data, you often need to transmit the actual schema and therefore, this cause an increase of payload size.
Total Payload Size
+-----------------+--------------------------------------------------+
| Schema | Serialised Data |
+-----------------+--------------------------------------------------+
Schema Registry provides a centralized repository for schemas and metadata so that all schemas are registered in a central system. This centralized system enables producers to only include the ID of the schema instead of the full schema itself (in text format).
Total Payload Size
+----+--------------------------------------------------+
| ID | Serialised Data |
+----+--------------------------------------------------+
Therefore, the serialisation becomes faster.
Furthermore, schema registry versioning enables the enforcement of data policies that might help to prevent newer schemas to break compatibility with existing versions that could potentially cause downtime or any other significant issues in your pipeline.
Some more benefits of Schema Registry are thoroughly explained in this article by Confluent.