Apache Flink: How to count the total number of events in a DataStream

可紊 提交于 2019-12-01 10:18:36

问题


I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream as shown below

joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {

            @Override
            public Object map(JoinedEvent joinedEvent) throws Exception {

                number_of_joined_events += 1;

                return null;
            }
        });

Question # 1: Is this the appropriate way to count the number of events in the stream?

Question # 2: I have noticed a wired behavior, which some of you might not believe. The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events but 0 in the case when I submit this program as jar. So I am getting the initial value of number_of_joined_events when I run the program as a jar file instead of the actual count. Why is this happening only in case of jar file submission and not in IDE?


回答1:


Your approach is not working. The behavior you noticed when executing the program via a JAR file is expected.

I don't know how number_of_joined_events is defined, but I assume its a static variable in your program. When you run the program in your IDE, it runs in a single JVM. Hence, all operators have access to the static variable. When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated.

You can use Flink's metrics or a ReduceFunction that sums 1s to count the number of processed records.



来源:https://stackoverflow.com/questions/47246721/apache-flink-how-to-count-the-total-number-of-events-in-a-datastream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!