flink keyBy adding delay; how can I reduce this latency?

醉酒当歌 提交于 2019-11-28 12:46:00

问题


When I ran a simple flink application with KeyedStream, I observed the time latency of an event varies from 0 to 100ms. Below is the program

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    DataStream<Long> source = env.addSource(new SourceFunction<Long>() {
        public void run(SourceContext<Long> sourceContext) throws Exception {
            while(true) {
                synchronized (sourceContext.getCheckpointLock()) {
                    sourceContext.collect(System.currentTimeMillis());
                    Thread.sleep(1000);
                }
            }
        }

        public void cancel() {}
    }).keyBy(new KeySelector<Long, Long>() {
        @Override
        public Long getKey(Long l) throws Exception {
            return l;
        }
    }).addSink(new SinkFunction<Long>() {
        @Override
        public void invoke(Long l) throws Exception {
            long diff = System.currentTimeMillis() - l;
            System.out.println("in Sink: diff=" + diff);
        }
    });
    env.execute();

Output is:

in Sink: diff=0
in Sink: diff=2
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9
in Sink: diff=9
in Sink: diff=11
in Sink: diff=12
in Sink: diff=14
in Sink: diff=14
in Sink: diff=16
in Sink: diff=17
in Sink: diff=18
in Sink: diff=19
in Sink: diff=21
in Sink: diff=22
in Sink: diff=24
in Sink: diff=24
in Sink: diff=26
in Sink: diff=27
in Sink: diff=29
in Sink: diff=29
in Sink: diff=31
in Sink: diff=32
in Sink: diff=34
in Sink: diff=34
in Sink: diff=36
in Sink: diff=37
in Sink: diff=39
in Sink: diff=40
in Sink: diff=41
in Sink: diff=43
in Sink: diff=45
in Sink: diff=45
in Sink: diff=47
in Sink: diff=48
in Sink: diff=50
in Sink: diff=50
in Sink: diff=52
in Sink: diff=53
in Sink: diff=55
in Sink: diff=57
in Sink: diff=57
in Sink: diff=59
in Sink: diff=60
in Sink: diff=61
in Sink: diff=62
in Sink: diff=63
in Sink: diff=65
in Sink: diff=66
in Sink: diff=67
in Sink: diff=69
in Sink: diff=70
in Sink: diff=72
in Sink: diff=72
in Sink: diff=74
in Sink: diff=76
in Sink: diff=77
in Sink: diff=78
in Sink: diff=79
in Sink: diff=81
in Sink: diff=82
in Sink: diff=83
in Sink: diff=84
in Sink: diff=86
in Sink: diff=87
in Sink: diff=88
in Sink: diff=89
in Sink: diff=91
in Sink: diff=92
in Sink: diff=94
in Sink: diff=94
in Sink: diff=96
in Sink: diff=97
in Sink: diff=99
in Sink: diff=99
in Sink: diff=0
in Sink: diff=2
in Sink: diff=3
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9
in Sink: diff=9
in Sink: diff=11
in Sink: diff=12
in Sink: diff=14
in Sink: diff=14
in Sink: diff=16
in Sink: diff=17
in Sink: diff=18
in Sink: diff=19
in Sink: diff=21
in Sink: diff=22
in Sink: diff=24
in Sink: diff=24
in Sink: diff=26
in Sink: diff=46
in Sink: diff=48
in Sink: diff=50
in Sink: diff=52
in Sink: diff=53
in Sink: diff=54
in Sink: diff=56
in Sink: diff=58
in Sink: diff=59
in Sink: diff=60
in Sink: diff=62
in Sink: diff=64
in Sink: diff=65
in Sink: diff=66
in Sink: diff=68
in Sink: diff=70
in Sink: diff=71
in Sink: diff=73
in Sink: diff=74
in Sink: diff=76
in Sink: diff=77
in Sink: diff=79
in Sink: diff=81
in Sink: diff=82
in Sink: diff=83
in Sink: diff=85
in Sink: diff=86
in Sink: diff=88
in Sink: diff=88
in Sink: diff=90
in Sink: diff=92
in Sink: diff=92
in Sink: diff=94
in Sink: diff=95
in Sink: diff=97
in Sink: diff=98
in Sink: diff=99
in Sink: diff=0
in Sink: diff=2
in Sink: diff=4
in Sink: diff=4
in Sink: diff=5
in Sink: diff=7
in Sink: diff=9

As you can see the latency is a pattern gradually increases to 100 and the drops and starts from 0 and the cycle repeats. I need the latency to be as low as possible. This example is a simplified version of my real application. Can someone explain me the reason for latency and how to reduce it to as low as possible.


回答1:


The reason for this delay is that by adding that keyBy you are forcing a network shuffle along with serialization/deserialization. The reason the delay is so variable is because of the network buffering.

You'll want to read the section of the documentation called Controlling Latency. The tl;dr is that you want to set the network buffer timeout to something small, but non-zero (e.g., 5 or 10 ms):

env.setBufferTimeout(timeoutMillis);

For details on how the network stack in Flink is organized, see A Deep-Dive into Flink's Network Stack on the Flink project blog.

While we're on the subject, other sources of latency can include checkpoint barrier alignment and garbage collection.

env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.AT_LEAST_ONCE);

will disable barrier alignment, at the cost of giving up exactly once processing semantics.

Using the RocksDB state backend will reduce the number of objects to garbage collect (since it keeps its state on the heap), thereby improving worst case latency.

Also,

env.getConfig().enableObjectReuse();

will instruct the runtime to reuse user objects for better performance. Keep in mind that this can lead to bugs when user-code functions are not aware of this behavior.

If you are interested in measuring latency, take a look at Latency Tracking and the section on latency in Monitoring Apache Flink Applications 101.



来源:https://stackoverflow.com/questions/56819799/flink-keyby-adding-delay-how-can-i-reduce-this-latency

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!