问题

Here are some of the queries I have :

I have two different streams stream1 and stream2 in which the elements are in order.

1) Now when I do keyBy on each of these streams, will the order be maintained? (Since every group here will be sent to one task manager only ) My understanding is that the records will be in order for a group, correct me here.

2) After the keyBy on both of the streams I am doing co-group to get the matching and non-matching records. Will the order be maintained here also?, since this also works on KeyedStream. I am using EventTime, and AscendingTimestampExtractor for generating timestamp and watermark.

3) Now I want to perform the sequence check on the matching_nonMatchingStream I get from 2) using map/flatmap. Do I need to again perform the keyBy here , or if I keep in chain will the matching_nonMatchingStream run in same TaskManager? My understanding here is that the chain will work here, correct me , getting confused.

4) slotSharingGroup - can you please describe more about this according to the doc : Sets the slot sharing group of this operation. Parallel instances of operations that are in the same slot sharing group will be co-located in the same TaskManager slot, if possible.

回答1:

1) Yes and no. Flink uses so-called Watermarks to track the ordering. This ensures that records can be assigned to the correct windows and windows are not closed until all data is available. However, a strict order is not guaranteed per group (because of parallel incoming data). Between groups, there is no ordering guarantee at all.

2) Basically same answer as for (1).

3) You do not need to use keyBy again. The map/flatMap will be chained by default.

4) See https://ci.apache.org/projects/flink/flink-docs-release-1.0/internals/general_arch.html#the-processes

回答2:

Concerning Ordering Guarantees

This page gives a good overview and explanation, also of ordering guarantees: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows

The Gist is:

Order is maintained within each parallel stream partition. For an explanation of stream partitions, see here: https://ci.apache.org/projects/flink/flink-docs-release-1.0/concepts/concepts.html#parallel-dataflows

For operations like "keyBy()" or "rebalance()" that change the partitioning, the order is maintained per pair of source and target stream partition, meaning per pair of sending and receiving operator.

As Matthias mentioned, if a group (defined by a key, running on one receiving target operator) gets elements from multiple senders, there is no well defined strict ordering of elements. Using concepts like event time, you can impose a meaningful ordering based on the data (the attached timestamps).

来源：https://stackoverflow.com/questions/38354713/ordering-of-records-in-stream

标签

apache-flink

flink-streaming

Ordering of Records in Stream

问题

回答1:

回答2:

Concerning Ordering Guarantees