Apache Flink Tumbling Window delayed result

风流意气都作罢 提交于 2021-02-10 18:30:46

问题


Met an issue with an apache flink app using tumbling window. The window size is 10 seconds and I expect to have the resultSet DataStream every 10 seconds. However when the resultSet of the latest window is always delayed unless I push further data to the source stream.

For example, if I push several records to the source stream between '01:33:40.0' and '01:34:00.0' and then stop to watch the log nothing will happen.

I push some data again on '01:37:XX' and then will get the resultSet of the window between '01:33:40.0' and '01:34:00.0' which is not expected because the downstream sink logic is expecting the resultSet on time.

Any hints to improve this will be very much appreciated. Thanks.

Below is the log:

"log timestamp": "2019-11-15 01:37:45",
"message": "resultSet output: CLASS: 13 CNT: 1 from: 2019-11-15 01:33:40.0 to: 2019-11-15 01:34:00.0\n",

Below is the code snippet:

Table resultTable = tableEnv.sqlQuery(""+
    "SELECT " +
    "  CAST (N02_001 AS VARCHAR(10)) AS RAILWAY_CLASS, " +
    "  COUNT(*) RAILWAY_CLASS_COUNT, " +
    "  TUMBLE_START(rowtime, INTERVAL '20' SECOND) as WINDOW_START, " +
    "  TUMBLE_END(rowtime, INTERVAL '20' SECOND) as WINDOW_END " +
    " FROM Inputs " +
    " GROUP BY TUMBLE(rowtime, INTERVAL '20' SECOND), CAST (N02_001 AS VARCHAR(10))");


TupleTypeInfo<Tuple4<String, Long, Timestamp, Timestamp>> tupleType = new TupleTypeInfo<>(
    Types.STRING,
    Types.LONG,
    Types.SQL_TIMESTAMP,
    Types.SQL_TIMESTAMP);

DataStream<Tuple4<String, Long, Timestamp, Timestamp>> resultSet = tableEnv.toAppendStream(resultTable, tupleType);

resultSet
.map((Tuple4<String, Long, Timestamp, Timestamp> value) -> {
    String output = "CLASS: " + value.f0 + " CNT: " + value.f1 + " from: " + value.f2 + " to: " + value.f3 + "\n";
    log.warn("resultSet output: " + output);
    return value;
})
.returns(Types.TUPLE(Types.STRING, Types.LONG, Types.SQL_TIMESTAMP, Types.SQL_TIMESTAMP));

回答1:


This is the expected behavior, You are using EventTime, which means that the Watermarks used for closing windows and to track the time flow in the application come from event timestamps. This means that if there are no events, there will be not time flow and thus now windows will be generated. That is what You are observing.

The behavior You are experiencing comes most probably from the fact that You are using AssignerWithPunctuatedWatermark, which emits timestamps and watermarks per each event. If You switch to AssignerWithPeriodicWatermark this should generate the watermark even if no data is present, and close & emit the window.



来源:https://stackoverflow.com/questions/58907070/apache-flink-tumbling-window-delayed-result

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!