问题
I am trying to perform a map operation on a KeyedStream in Flink:
stream.map(new JsonToMessageObjectMapper())
.keyBy("keyfield")
.map(new MessageProcessorStateful())
The output of the JsonToObjectMapper operator is a POJO of class MessageObject which has a String field 'keyfield'. The stream is then keyed on this field.
The MessageProcessorStateful is a RichMapFunction like this:
public class MessageAdProcessorStateful extends RichMapFunction<MessageObject, Tuple2<String, String>> {
private transient MapState<String, Tuple2<Tuple3<String, String, String>, Tuple2<Double, Long>>> state;
...
@Override
public void open(Configuration config) throws Exception {
MapStateDescriptor<String, Tuple2<Tuple3<String, String, String>, Tuple2<Double, Long>>> descriptor =
new MapStateDescriptor<>(
"state", // the state name
TypeInformation.of(new TypeHint<String>() {}),
TypeInformation.of(new TypeHint<Tuple2<Tuple3<String, String, String>, Tuple2<Double, Long>>>() {}) ); // type information
state = getRuntimeContext().getMapState(descriptor);
state.put(...); // Insert a key, value here. Exception here!
}
}
The code throws a NullPointer exception:
Caused by: java.lang.NullPointerException: No key set. This method should not be called outside of a keyed context.
at org.apache.flink.util.Preconditions.checkNotNull(Preconditions.java:75)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.checkKeyNamespacePreconditions(CopyOnWriteStateTable.java:528)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.computeHashForOperationAndDoIncrementalRehash(CopyOnWriteStateTable.java:722)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:265)
at org.apache.flink.runtime.state.heap.CopyOnWriteStateTable.get(CopyOnWriteStateTable.java:306)
at org.apache.flink.runtime.state.heap.HeapMapState.put(HeapMapState.java:75)
at org.apache.flink.runtime.state.UserFacingMapState.put(UserFacingMapState.java:52)
at org.myorg.quickstart.MessageStreamProcessor$MessageAdProcessorStateful.open(MessageStreamProcessor.java:226)
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:393)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:254)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)
Seems the key in the keyedState for one of the KeyedStream is null although I have verified that the 'keyfield' is always a valid string. Rest seems to be correct as per the Flink documentation. Any idea what is going on?
回答1:
The problem is that you try to access the keyed state in the open()
method.
Keyed state maintains a state instance for each key. In your example you are using MapState
. So you have one MapState
instance for each key. When accessing the state, you'll always get the state instance that corresponds to the key of the currently processed record. In a MapFunction
(like in your example) this would be the record that is passed to the map()
method.
Since open()
is not called with a record, the current key in open()
is null
and it is not possible to access the keyed state.
来源:https://stackoverflow.com/questions/48555816/flink-keyed-stream-key-is-null