Using kafka-streams with custom partitioner

柔情痞子 提交于 2020-01-30 11:31:27

问题


I want to join a KStream with a KTable. Both have a different key but are co-partitioned using a custom partitioner. However, the join does not produce and results.

The KStream has the following structure
- key: House - Group
- value: User
The KTable has the following structure
- key: User - Group
- value: Address

To make sure every insert both topics are processed in insertion order, I'm using a custom Partitioner where I'm partitioning both topics using the Group part of each key.

I want to end up with a stream of the following structure:
- key: House - Group
- value: User - Address

For this I'm doing the following:

val streamsBuilder = streamBuilderHolder.streamsBuilder
val houseToUser = streamsBuilder.stream<HouseGroup, User>("houseToUser")
val userToAddress = streamsBuilder.table<UserGroup, Address>("userToAddress")
val result: KStream<HouseGroup, UserWithAddress> = houseToUser
        .map { k: HouseGroup, v: User ->
            val newKey = UserGroup(v, k.group)
            val newVal = UserHouse(v, k.house)
            KeyValue(newKey, newVal)
        }
        .join(userToAddress) { v1: UserHouse, v2: Address ->
            UserHouseWithAddress(v1, v2)
        }
        .map{k: UserGroup, v: UserHouseWithAddress ->
            val newKey = HouseGroup(v.house, k.group)
            val newVal = UserWithAddress(k.user, v.address)
            KeyValue(newKey, newVal)
        }

This expected a matching join but that did not work.

I guess the obvious solution is to join with a global table and let go of the custom partitioner. However, I still don't understand why the above would not work.


回答1:


I think the lack of matching is caused because different partitioners are used.

For your input topic CustomPartitioner is used. Kafka Streams be default uses org.apache.kafka.clients.producer.internals.DefaultPartitioner.

In your code just before KStream::join you have called KStream::map. KStream::map function enforced repartitioning before KStream::join. During repartioning messages are flushed to Kafka ($AppName-KSTREAM-MAP-000000000X-repartition topic). To spread messages Kafka Streams uses defined partitioner (property: ProducerConfig.PARTITIONER_CLASS_CONFIG). Summarizing: messages with same keys might be in different partition for "repartition topic" and for "KTable topic"

Solution in your case will be set your custom partition in properties for your Kafka Streams application (props.put(ProducerConfig.PARTITIONER_CLASS_CONFIG, "com.example.CustomPartitioner")

For debugging you can check repartition topic ($AppName-KSTREAM-MAP-000000000X-repartition). Messages with same keys like input topic might be in different partitions (different number)

Documentation about Join co-partitioning requirements



来源:https://stackoverflow.com/questions/57930242/using-kafka-streams-with-custom-partitioner

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!