问题
Let's say we want to compute the sum and average of the items,
and can either working with states
or windows
(time).
Example working with windows
-
https://ci.apache.org/projects/flink/flink-docs-release-0.10/apis/streaming_guide.html#example-program
Example working with states
-
https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/ride_speed/RideSpeed.java
Can I ask what would be the reasons to make decision? Can I infer that if the data arrives very irregularly (50% comes in the defined window length and the other 50% don't), the result of the window approach is more biased (because the 50% events are dropped)?
On the other hand, do we spend more time checking and updating the states when working with states?
回答1:
First, it depends on your semantics... The two examples use different semantics and are thus not comparable directly. Furthermore, windows work with state internally, too. It is hard to say in general with approach is the better one.
As Flink's window semantics are very rich, I would suggest to use windows. If you cannot express your semantics with windows, using state can be a good alternative. Using windows, has the additional advantage that state handling---which is hard to get done right---is done automatically for you.
The decision is definitely independent from your data arrival rate. Flink does not drop any data. If you work with event time (rather than with processing time) your result will be the same independently of the data arrival rate after all.
来源:https://stackoverflow.com/questions/34596230/differences-between-working-with-states-and-windowstime-in-flink-streaming