How to sort an out-of-order event time stream using Flink

后端 未结 1 1892
猫巷女王i
猫巷女王i 2021-01-14 15:02

This question covers how to sort an out-of-order stream using Flink SQL, but I would rather use the DataStream API. One solution is to do this with a ProcessFunction that us

相关标签:
1条回答
  • 2021-01-14 15:37

    A better approach (which is more-or-less what is done internally by Flink's SQL and CEP libraries) is to buffer the out-of-order stream in MapState, as follows:

    If you are sorting each key independently, then first key the stream. Otherwise, for a global sort, key the stream by a constant so that you can use a KeyedProcessFunction to implement the sorting.

    In the open method of that process function, instantiate a MapState object, where the keys are timestamps and the values are lists of stream elements all having the same timestamp.

    In the onElement method:

    • If an event is late, either drop it or send it to a side output
    • Otherwise, append the event to entry of the map corresponding to its timestamp
    • Register an event time timer for this event's timestamp

    When onTimer is called, then the entries in the map for this timestamp are ready to be released as part of the sorted stream -- because the current watermark now indicates that all earlier events should have already been processed. Don't forget to clear the entry in the map after sending the events downstream.

    0 讨论(0)
提交回复
热议问题