问题
Lets say I have a Kafka topic named SensorData
to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour
. With this scenario
- How can I select a specific topic partition using the
KStreamBuilder#stream
method? - Is it possible to apply some aggregation function over two (multiple) different partitions from same topic?
回答1:
You cannot (directly) access single partitions and you cannot (directly) apply an aggregation function over multiple partitions.
Aggregations are always done per key
: http://docs.confluent.io/current/streams/developer-guide.html#stateful-transformations
- Thus, you could use a different key for each partition and than aggregate by key. See http://docs.confluent.io/current/streams/developer-guide.html#windowing-a-stream
The simplest way is to let each of your producers apply a key to each message right away.
- If you want to aggregate multiple partitions, you first need to set a new key (e.g., using
selectKey()
) and set the same key for all data you want to aggregate (if you want to aggregate all partitions, you would use a single key value -- however, keep in mind, this might quickly become a bottleneck!).
来源:https://stackoverflow.com/questions/38990218/aggregation-over-a-specific-partition-in-apache-kafka-streams