Flink: Sharing state in CoFlatMapFunction

狂风中的少年 提交于 2019-12-22 03:44:35

问题


Got stuck a bit with CoFlatMapFunction. It seems to work fine if I place it on the DataStream before window but fails if placed after window's “apply” function.

I was testing two streams, main “Features” on flatMap1 constantly ingesting data and control stream “Model” on flatMap2 changing the model on request.

I am able to set and see b0/b1 properly set in flatMap2, but flatMap1 always see b0 and b1 as was set to 0 at the initialization.

Am I missing something obvious here?

public static class applyModel implements CoFlatMapFunction<Features, Model, EnrichedFeatures> {
    private static final long serialVersionUID = 1L;

    Double b0;
    Double b1;

    public applyModel(){
        b0=0.0;
        b1=0.0;
    }

    @Override
    public void flatMap1(Features value, Collector<EnrichedFeatures> out) {
        System.out.print("Main: " + this + "\n");
    }

    @Override
    public void flatMap2(Model value, Collector<EnrichedFeatures> out) {
        System.out.print("Old Model: " + this + "\n");
        b0 = value.getB0();
        b1 = value.getB1();
        System.out.print("New Model: " + this + "\n");
    }

    @Override
    public String toString(){
        return "CoFlatMapFunction: {b0: " + b0 + ", b1: " + b1 + "}";
    }
}

回答1:


Here is the answer from the mailing list...

Is the CoFlatMapFunction intended to be executed in parallel?

If yes, you need some way to deterministically assign which record goes to which parallel instance. In some way the CoFlatMapFunction does a parallel (partitions) join between the model and the result of the session windows, so you need some form of key that selects which partition the elements go to. Does that make sense?

If not, try to set it to parallelism 1 explicitly.

Greetings, Stephan


A global state that all can access read-only is doable via broadcast().

A global state that is available to all for read and update is currently not available. Consistent operations on that would be quite costly, require some form of distributed communication/consensus.

Instead, I would encourage you to go with the following:

1) If you can partition the state, use a keyBy().mapWithState() - That localizes state operations and makes it very fast.

2) If your state is not organized by key, your state is probably very small, and you may be able to use a non-parallel operation.

3) If some operation updates the state and another one accesses it, you can often implement that with iterations and a CoFlatMapFunction (one side is the original input, the other the feedback input).

All approaches in the end localize state access and modifications, which is a good pattern to follow, if possible.

Greetings, Stephan



来源:https://stackoverflow.com/questions/33755697/flink-sharing-state-in-coflatmapfunction

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!