Why isn't the AvroCoder deterministic?

折月煮酒 提交于 2019-12-24 10:48:29

问题


AvroCoder.isDeterministic returns false.

Why isn't the AvroCoder deterministic? Wouldn't Avro records always be encoded into the same byte stream?

Since the Avro Coder isn't deterministic an Avro record can't be used as a Key for a group by operation. What's the best way to turn an Avro record into a key? Should we just use the json representation of the Avro record?


回答1:


Based on the Avro specification it looks like only Arrays and Maps have non deterministic binary encoding.

Maps look like they are non deterministically encoded for two reasons

  • The order of the elements isn't specified
  • The blocks can be encoded two different ways either by specifying the number of elements or the number of bytes in the block.

Arrays look like they are non deterministically encoded because

  • The block can be encoded two different ways either by specifying the number of elements or the number of bytes in the block.

So for any schema without an array or a map, I think the binary encoding should be deterministic. So I think we could create a deterministic encoder just by subclassing AvroCoder and overriding AvroCoder.isDeterministic to return true.

AvroDeterministicCoder is my first attempt at creating such a coder.




回答2:


AvroCoder can inspect the schema and type being coded and decide when it is deterministic. It was added in GitHub commit #a806df.

It includes support for deterministically encoding arrays and maps when the underlying collection is deterministically order.



来源:https://stackoverflow.com/questions/28129664/why-isnt-the-avrocoder-deterministic

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!