问题
I tried to upgrade my flink version in my cluster to 1.3.1 (and 1.3.2 as well) and I got the following exception in my task managers:
2018-02-28 12:57:27,120 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error during disposal of stream operator.
org.apache.kafka.common.KafkaException: java.lang.InterruptedException
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:424)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase.close(FlinkKafkaProducerBase.java:317)
at org.apache.flink.api.common.functions.util.FunctionUtils.closeFunction(FunctionUtils.java:43)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.dispose(AbstractUdfStreamOperator.java:126)
at org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:429)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:334)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:422)
... 7 more
The job manager showed that it failed to connect with the task managers.
I am using FlinkKafkaProducer08
.
Any ideas?
回答1:
First of all, from the stack trace above: it was thrown during operator cleanup of a non-graceful termination (otherwise this code is not executed). It looks as if it should be followed by the real exception that caused the initial problem. Can you provide some more parts of the log?
If the JobManager failed to connect to any TaskManager that should run your job, the whole job will be cancelled (and retried based on your retry policy). The same may happen on your TaskManager side. That may be the root cause and needs further investigation.
来源:https://stackoverflow.com/questions/49030543/exception-when-trying-to-upgrade-to-flink-1-3-1