Ordering of Records in Stream

后端 未结 2 1273
失恋的感觉
失恋的感觉 2021-01-15 05:53

Here are some of the queries I have :

I have two different streams stream1 and stream2 in which the elements are in order.

1) Now

相关标签:
2条回答
  • 2021-01-15 06:40

    1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.

    2) Basically same answer as for (1).

    3) You do not need to use keyBy again. The map/flatMap will be chained by default.

    4) See https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/general_arch.html#the-processes

    0 讨论(0)
  • 2021-01-15 06:57

    Concerning Ordering Guarantees

    This page gives a good overview and explanation, also of ordering guarantees: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows

    The Gist is:

    Order is maintained within each parallel stream partition. For an explanation of stream partitions, see here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows

    For operations like "keyBy()" or "rebalance()" that change the partitioning, the order is maintained per pair of source and target stream partition, meaning per pair of sending and receiving operator.

    As Matthias mentioned, if a group (defined by a key, running on one receiving target operator) gets elements from multiple senders, there is no well defined strict ordering of elements. Using concepts like event time, you can impose a meaningful ordering based on the data (the attached timestamps).

    0 讨论(0)
提交回复
热议问题