According to release notes of dataflow 2.X, IntraBundleParallelization is removed. Is there a way to control/increase parallelism of DoFns on dataflow 2.1.0 ?
I was get
It was removed because its implementation keeps a handle on the ProcessContext
of a ProcessElement
call after the call completes, and this is unsafe and not guaranteed to work.
However, I agree that it was a useful abstraction, and it is unfortunate that we don't have a replacement yet.
As a workaround, you can try the following:
@Setup
, create an Executor
with the needed number of threads@StartBundle
, create an ExecutorCompletionService
wrapping the executor@ProcessElement
, submit a Future
to it representing the result of processing the element@ProcessElement
, also poll()
the CompletionService
for completed futures and output their results@FinishBundle
, wait for all remaining futures to complete, output their results, and shut down the CompletionService
.Remember to not use the ProcessContext
in your futures. ProcessContext
can only be used from the current thread and from within the current ProcessElement
call.