Differences between working with states and windows(time) in Flink streaming

为君一笑 提交于 2019-12-06 16:09:04

问题


Let's say we want to compute the sum and average of the items, and can either working with states or windows(time).

Example working with windows - https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program

Example working with states - https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java

Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and the other 50% don't), the result of the window approach is more biased (because the 50% events are dropped)?

On the other hand, do we spend more time checking and updating the states when working with states?


回答1:


First, it depends on your semantics... The two examples use different semantics and are thus not comparable directly. Furthermore, windows work with state internally, too. It is hard to say in general with approach is the better one.

As Flink's window semantics are very rich, I would suggest to use windows. If you cannot express your semantics with windows, using state can be a good alternative. Using windows, has the additional advantage that state handling---which is hard to get done right---is done automatically for you.

The decision is definitely independent from your data arrival rate. Flink does not drop any data. If you work with event time (rather than with processing time) your result will be the same independently of the data arrival rate after all.



来源:https://stackoverflow.com/questions/34596230/differences-between-working-with-states-and-windowstime-in-flink-streaming

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!