Consume from two flink dataStream based on priority or round robin way

怎甘沉沦 提交于 2020-06-28 08:39:49

问题


I have two flink dataStream. For ex: dataStream1 and dataStream2. I want to union both the Streams into 1 stream so that I can process them using the same process functions as the dag of both dataStream is the same.

As of now, I need equal priority of consumption of messages for either stream. The producer of dataStream2 produces 10 messages per minute, while the producer of dataStream1 produces 1000 messages per second. Also, dataTypes are the same for both dataStreams.DataSteam2 more of a high priority queue that should be consumed asap. There is no relation between messages of dataStream1 and dataStream2

Does dataStream1.union(dataStream2) will produce a Stream that will have elements of both Streams?


回答1:


Probably the simplest solution to this problem, yet not exactly the most efficient one depending on the exact specification of the sources for Your data, may be connecting the two streams. In this solution, You could use the CoProcessFunction, which will invoke separate methods for each of the connected streams.

In this solution, You could simply buffer the elements of one stream until they can be produced (for example in round-robin manner). But keep in mind that this may be quite inefficient if there is a very big difference between the frequency in which sources produce events.




回答2:


It sounds like the two DataStreams have different types of elements, though you didn't specify that explicitly. If that's the case, then create an Either<stream1 type, stream2 type> via a MapFunction on each stream, then union() the two streams. You won't get exact intermingling of the two, as Flink will alternate consuming from each stream's network buffer.

If you really want nicely mixed streams, then (as others have noted) you'll need to buffer incoming elements via state, and also apply some heuristics to avoid over-buffering if for any reason (e.g. differing network latency, or more likely different performance between the two sources) you have very different data rates between the two streams.




回答3:


You may want to use a custom operator that implements the InputSelectable interface in order to reduce the amount of buffering needed. I've included an example below that implements interleaving without any buffering, but be sure to read the caveat in the docs which explains that

... the operator may receive some data that it does not currently want to process ...

In other words, this simple example can't be relied upon to really work as is.

public class Alternate {
    public static void main(String[] args) throws Exception {

        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        env.setParallelism(1);

        DataStream<Long> positive = env.generateSequence(1L, 100L);
        DataStream<Long> negative = env.generateSequence(-100L, -1L);

        AlternatingTwoInputStreamOperator op = new AlternatingTwoInputStreamOperator();

        positive
            .connect(negative)
            .transform("Hack that needs buffering", Types.LONG, op)
            .print();

        env.execute();
    }
}

class AlternatingTwoInputStreamOperator extends AbstractStreamOperator<Long>
        implements TwoInputStreamOperator<Long, Long, Long>, InputSelectable {

    private InputSelection nextSelection = InputSelection.FIRST;

    @Override
    public void processElement1(StreamRecord<Long> element) throws Exception {
        output.collect(element);
        nextSelection = InputSelection.SECOND;
    }

    @Override
    public void processElement2(StreamRecord<Long> element) throws Exception {
        output.collect(element);
        nextSelection = InputSelection.FIRST;
    }

    @Override
    public InputSelection nextSelection() {
        return this.nextSelection;
    }
}

Note also that InputSelectable was added in Flink 1.9.0.



来源:https://stackoverflow.com/questions/59742165/consume-from-two-flink-datastream-based-on-priority-or-round-robin-way

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!