Is there an alternative to IntrabundleParallelization in Dataflow 2.1.0?

后端 未结 1 1101
别那么骄傲
别那么骄傲 2021-01-25 21:19

According to release notes of dataflow 2.X, IntraBundleParallelization is removed. Is there a way to control/increase parallelism of DoFns on dataflow 2.1.0 ?

I was get

相关标签:
1条回答
  • 2021-01-25 21:56

    It was removed because its implementation keeps a handle on the ProcessContext of a ProcessElement call after the call completes, and this is unsafe and not guaranteed to work.

    However, I agree that it was a useful abstraction, and it is unfortunate that we don't have a replacement yet.

    As a workaround, you can try the following:

    • In your DoFn's @Setup, create an Executor with the needed number of threads
    • In your DoFn's @StartBundle, create an ExecutorCompletionService wrapping the executor
    • In @ProcessElement, submit a Future to it representing the result of processing the element
    • In @ProcessElement, also poll() the CompletionService for completed futures and output their results
    • In @FinishBundle, wait for all remaining futures to complete, output their results, and shut down the CompletionService.

    Remember to not use the ProcessContext in your futures. ProcessContext can only be used from the current thread and from within the current ProcessElement call.

    0 讨论(0)
提交回复
热议问题