According to the Apache Beam 2.0.0 SDK Documentation GroupIntoBatches
works only with KV
collections.
My dataset contains only values and
It is required to provide KV inputs to GroupIntoBatches
because the transform is implemented using state and timers, which are per key-and-window.
For each key+window pair, state and timers necessarily execute serially (or observably so). You have to manually express the available parallelism by providing keys (and windows, though no runner that I know of parallelizes over windows today). The two most common approaches are:
GroupIntoBatches
is actually useful.Adding one dummy key to all elements as in your snippet will cause the transform to not execute in parallel at all. This is similar to the discussion at Stateful indexing causes ParDo to be run single-threaded on Dataflow Runner.