How to stop a flink streaming job from program

让人想犯罪 __ 提交于 2019-12-12 08:36:41

问题


I am trying to create a JUnit test for a Flink streaming job which writes data to a kafka topic and read data from the same kafka topic using FlinkKafkaProducer09 and FlinkKafkaConsumer09 respectively. I am passing a test data in the produce:

DataStream<String> stream = env.fromElements("tom", "jerry", "bill");

And checking whether same data is coming from the consumer as:

List<String> expected = Arrays.asList("tom", "jerry", "bill");
List<String> result =  resultSink.getResult();
assertEquals(expected, result);

using TestListResultSink.

I am able to see the data coming from the consumer as expected by printing the stream. But could not get the Junit test result as the consumer will keep on running even after the message finished. So it did not come to test part.

Is thre any way in Flink or FlinkKafkaConsumer09 to stop the process or to run for specific time?


回答1:


The underlying problem is that streaming programs are usually not finite and run indefinitely.

The best way, at least for the moment, is to insert a special control message into your stream which lets the source properly terminate (simply stop reading more data by leaving the reading loop). That way Flink will tell all down-stream operators that they can stop after they have consumed all data.

Alternatively, you can throw a special exception in your source (e.g. after some time) such that you can distinguish a "proper" termination from a failure case (by checking the error cause). Throwing an exception in the source will fail the program.




回答2:


Can you not use isEndOfStream override within the Deserializer to stop fetching from Kafka? If I read correctly, the flink/Kafka09Fetcher has the following code in its run method which breaks the event loop

    if (deserializer.isEndOfStream(value)) {
                        // end of stream signaled
                        running = false;
                        break;
                    }

My thought was to use Till Rohrmann's idea of a control message in conjunction with this isEndOfStream method to tell the KafkaConsumer to stop reading.

Any reason that will not work? Or maybe some corner cases I'm overlooking?

https://github.com/apache/flink/blob/07de86559d64f375d4a2df46d320fc0f5791b562/flink-connectors/flink-connector-kafka-0.9/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java#L146




回答3:


Following @TillRohrman

You can combine the special exception method and handle it in your unit test if you use an EmbeddedKafka instance, and then read off the EmbeddedKafka topic and assert the consumer values.

I found https://github.com/asmaier/mini-kafka/blob/master/src/test/java/de/am/KafkaProducerIT.java to be extremely useful in this regard.

The only problem is that you will lose the element that triggers the exception but you can always adjust your test data to account for that.



来源:https://stackoverflow.com/questions/44441153/how-to-stop-a-flink-streaming-job-from-program

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!