This question covers how to sort an out-of-order stream using Flink SQL, but I would rather use the DataStream API. One solution is to do this with a ProcessFunction that us
A better approach (which is more-or-less what is done internally by Flink's SQL and CEP libraries) is to buffer the out-of-order stream in MapState, as follows:
If you are sorting each key independently, then first key the stream. Otherwise, for a global sort, key the stream by a constant so that you can use a KeyedProcessFunction to implement the sorting.
In the open
method of that process function, instantiate a MapState object, where the keys are timestamps and the values are lists of stream elements all having the same timestamp.
In the onElement
method:
When onTimer
is called, then the entries in the map for this timestamp are ready to be released as part of the sorted stream -- because the current watermark now indicates that all earlier events should have already been processed. Don't forget to clear the entry in the map after sending the events downstream.