I\'m trying to understand the combine transformer in a apache beam pipeline.
Considering the following example pipeline:
def test_combine(data):
loggin
It looks it's happening due to the MapReduce structure. When using Combiners, the output that one combiner has is used as a input.
As an example, imagine summing 3 numbers (1, 2, 3). The combiner MAY sum first 1 and 2 (3) and use that number as input with 3 (3 + 3 = 6). In your case [1, 2, 3]
seems to be used as an input in the next combiner.
An example that really helped me understand this:
p = beam.Pipeline()
def make_list(elements):
print(elements)
return elements
(p | Create(range(30))
| beam.core.CombineGlobally(make_list))
p.run()
See that the element [1,..,10]
is used in the next combiner.