flink keyBy adding delay; how can I reduce this latency?

后端 未结 1 1771
别跟我提以往
别跟我提以往 2020-12-12 06:23

When I ran a simple flink application with KeyedStream, I observed the time latency of an event varies from 0 to 100ms. Below is the program

StreamExecutionE         


        
相关标签:
1条回答
  • 2020-12-12 07:21

    The reason for this delay is that by adding that keyBy you are forcing a network shuffle along with serialization/deserialization. The reason the delay is so variable is because of the network buffering.

    You'll want to read the section of the documentation called Controlling Latency. The tl;dr is that you want to set the network buffer timeout to something small:

    env.setBufferTimeout(timeoutMillis);
    

    You can set the buffer timeout to zero if you want, but that will impact throughput more than setting it to something small (like 1ms, or 5ms). The default is 100ms. For details on how the network stack in Flink is organized, see A Deep-Dive into Flink's Network Stack on the Flink project blog.

    While we're on the subject, other sources of latency can include checkpoint barrier alignment and garbage collection.

    env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);
    

    will disable barrier alignment, at the cost of giving up exactly once processing semantics.

    Using the RocksDB state backend will reduce the number of objects to garbage collect (since RocksDB keeps its state off-heap), in some cases improving worst case latency at the cost of worse average latency. However, with modern garbage collectors using RocksDB to improve worst-case latency can be a mistake.

    Also,

    env.getConfig().enableObjectReuse();
    

    will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when user-code functions are not aware of this behavior.

    If you are using watermarks, the watermark delay affects the latency with which event time timers will be triggered (including windows), and the autoWatermarkInterval also has an impact on latency.

    Finally, the use of transactional sinks adds end-to-end latency, since downstream consumers of those sinks won't see committed results until the transactions complete. The expected delay is roughly half the checkpoint interval.

    If you are interested in measuring latency, take a look at Latency Tracking and the section on latency in Monitoring Apache Flink Applications 101.

    0 讨论(0)
提交回复
热议问题