How to output result of windowed aggregation only when window is finished? [duplicate]

我们两清 提交于 2019-12-19 02:20:36

问题


I have a KStream in which I want to count some dimension of the events. I do it as follows:

KTable<Windowed<Long>, Counter> ret = input.groupByKey()
  .windowedBy(TimeWindows.of(Duration.of(10, SECONDS)))
  .aggregate(Counter::new, (k, v, c) -> new Counter(c.count + v.getDimension()));

I want to have a new KStream with those aggregations as events. I can do it easily like this:

ret.toStream().to("output");

The problem is that every event in "input" topic will produce an event to "output" topic. I would like to publish an event to the output topic only when a window is finished. For example if the window is of one minute, send a single event per key per minute.

I think I can do it like this:

ret.toStream().foreach((k, v) -> sendToKafkaTopic("output"));

But I wonder if there's a better / more elegant way of doing this?


回答1:


You can use new feature of KTable KTable.suppress in version 2.1

This method allows you get exactly one final result per window/key for windowed computations.

More about suppres in KIP-328

You can update your implementation with suppress like this:

KTable<Windowed<Long>, Counter> ret = input.groupByKey()
        .windowedBy(TimeWindows.of(Duration.of(10, SECONDS)))
        .aggregate(Counter::new, (k, v, c) -> new Counter(c.count + v.getDimension()))
        .suppress(untilWindowCloses(BufferConfig.unbounded()));

ret.toStream().to("output"); // now stream should flush events to the output topic only when the window closes


来源:https://stackoverflow.com/questions/53946013/how-to-output-result-of-windowed-aggregation-only-when-window-is-finished

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!