Aggregation over a specific partition in Apache Kafka Streams

问题

Lets say I have a Kafka topic named SensorData to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour. With this scenario

How can I select a specific topic partition using the KStreamBuilder#stream method?
Is it possible to apply some aggregation function over two (multiple) different partitions from same topic?

回答1:

You cannot (directly) access single partitions and you cannot (directly) apply an aggregation function over multiple partitions.

Aggregations are always done per key: http://docs.confluent.io/current/streams/developer-guide.html#stateful-transformations

Thus, you could use a different key for each partition and than aggregate by key. See http://docs.confluent.io/current/streams/developer-guide.html#windowing-a-stream

The simplest way is to let each of your producers apply a key to each message right away.

If you want to aggregate multiple partitions, you first need to set a new key (e.g., using selectKey()) and set the same key for all data you want to aggregate (if you want to aggregate all partitions, you would use a single key value -- however, keep in mind, this might quickly become a bottleneck!).

来源：https://stackoverflow.com/questions/38990218/aggregation-over-a-specific-partition-in-apache-kafka-streams

标签

apache-kafka-streams

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!