KSQL Hopping Window : accessing only oldest subwindow

不打扰是莪最后的温柔 提交于 2019-12-08 05:55:21

问题


I am tracking the rolling sum of a particular field by using a query which looks something like this :

SELECT id, SUM(quantity) AS quantity from stream \
WINDOW HOPPING (SIZE 1 MINUTE, ADVANCE BY 10 SECONDS) \
GROUP BY id;

Now, for every input tick, it seems to return me 6 different aggregated values I guess which are for the following time periods :

[start, start+60] seconds
[start+10, start+60] seconds
[start+20, start+60] seconds
[start+30, start+60] seconds
[start+40, start+60] seconds
[start+50, start+60] seconds

What if I am interested is only getting the [start, start+60] seconds result for every tick that comes in. Is there anyway to get ONLY that?


回答1:


Because you specify a hopping window, each record falls into multiple windows and all windows need to be updated when processing a record. Updating only one window would be incorrect and the result would be wrong.

Compare the Kafka Streams docs about hopping windows (Kafka Streams is KSQL's internal runtime engine): https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#hopping-time-windows




回答2:


I was in a similar situation and creating a user defined function to access only the window with collect_list(column).size() = window duration appears to be a promising track.

In the udf use List type to get one of your aggregate base column list of values. Then assess is the formed list size is equal to the hopping window number of period, return null otherwise.

From this create a table selecting data and transforming it with the udf.

Create a table from this latest table and filter out null values on the transformed column.



来源:https://stackoverflow.com/questions/51794596/ksql-hopping-window-accessing-only-oldest-subwindow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!