Is there an alternative to IntrabundleParallelization in Dataflow 2.1.0?

痴心易碎 提交于 2019-12-02 17:04:07

问题


According to release notes of dataflow 2.X, IntraBundleParallelization is removed. Is there a way to control/increase parallelism of DoFns on dataflow 2.1.0 ?

I was getting better performance when I used IntrabundleParallelization on 1.9.0 version of dataflow.


回答1:


It was removed because its implementation keeps a handle on the ProcessContext of a ProcessElement call after the call completes, and this is unsafe and not guaranteed to work.

However, I agree that it was a useful abstraction, and it is unfortunate that we don't have a replacement yet.

As a workaround, you can try the following:

  • In your DoFn's @Setup, create an Executor with the needed number of threads
  • In your DoFn's @StartBundle, create an ExecutorCompletionService wrapping the executor
  • In @ProcessElement, submit a Future to it representing the result of processing the element
  • In @ProcessElement, also poll() the CompletionService for completed futures and output their results
  • In @FinishBundle, wait for all remaining futures to complete, output their results, and shut down the CompletionService.

Remember to not use the ProcessContext in your futures. ProcessContext can only be used from the current thread and from within the current ProcessElement call.



来源:https://stackoverflow.com/questions/47023871/is-there-an-alternative-to-intrabundleparallelization-in-dataflow-2-1-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!