What's the difference between a watermark and a trigger in Flink?

问题

I read that, "..The ordering operator has to buffer all elements it receives. Then, when it receives a watermark it can sort all elements that have a timestamp that is lower than the watermark and emit them in the sorted order. This is correct because the watermark signals that not more elements can arrive that would be intermixed with the sorted elements..." - https://cwiki.apache.org/confluence/display/FLINK/Time+and+Order+in+Streams

Hence, it seems that the watermark serves as a signal to the following operator, for beginning processing. I guess, that's what also a Trigger does. What's the difference between the two?

回答1:

You can think of watermarks as special records that tell an operator what (event-) time it is. When an operator receives a watermark, it compares the watermark with its current time and other watermarks it received from different stream partitions. Depending on the comparison, the operator advances its own clock.

Some operators register timers (windows, time-based joins, custom implementations). An operator triggers a timer when the clock of the operator passes the time for which the timer was registered.

So, watermarks and timers are two different things. Watermarks tell an operator what time it is and the operator triggers a timer at the right point in time.

回答2:

A Watermark can be thought of as an assertion that an event time stream is now complete up to a particular timestamp. When a Watermark is processed by an operator it will cause the firing of any relevant event time timers. The operators that use EventTimeTimers are EventTimeWindows and ProcessFunctions.

Triggers are part of the window API and define when Windows will produce results. An EventTimeTrigger wraps around an event time timer that is called when an suitably large Watermark is processed, indicating that the window is now complete.

来源：https://stackoverflow.com/questions/55297390/whats-the-difference-between-a-watermark-and-a-trigger-in-flink

标签

stream

bigdata

real-time

apache-flink

flink-streaming