Kafka KStream - using AbstractProcessor with a Window

前端 未结 3 1181
猫巷女王i
猫巷女王i 2021-01-16 08:27

Im hoping to group together windowed batches of output from a KStream and write them to a secondary store.

I was expecting to see .punctuate() get call

3条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-16 09:27

    Update: this part of the answer is for Kafka version 0.11 or earlier (for Kafka 1.0 and later see below)

    In Kafka Streams, punctuations are based on stream-time and not system time (aka processing-time).

    Per default stream-time is event-time, ie, the timestamp embedded in the Kafka records themselves. As you do not set a non-default TimestampExtractor (see timestamp.extractor in http://docs.confluent.io/current/streams/developer-guide.html#optional-configuration-parameters), the calls to punctuate depend only on the process of the event time with regard to the records you process. Thus, if you need multiple minutes to process "30 seconds" (event time) of records, punctuate will be called less frequently than 30 seconds (wall-clock time)...

    This can also explain your irregular calling patterns (ie, burst and long delays). If your data event time does "jump", and your data to be processed is already completely available in your topic, Kafka Streams also "jumps" with regard to internally maintained stream-time.

    I would assume, that you can resolve your issue by using WallclockTimestampExtractor (see http://docs.confluent.io/current/streams/developer-guide.html#timestamp-extractor)

    One more thing to mention: stream-time is only advanced if data is processed -- if your application reaches the end of the input topics and waits for data, punctuate will not be called. This applies even if you use WallclockTimestampExtractor.

    Btw: there is currently a discussion about the punctuation behavior of Streams: https://github.com/apache/kafka/pull/1689

    Answer for Kafka 1.0 and later

    Since Kafka 1.0 it is possible to register punctuations based on wall-clock time or event-time: https://kafka.apache.org/10/documentation/streams/developer-guide/processor-api.html#id2

提交回复
热议问题