why is the combine function called three times?

后端 未结 1 478
無奈伤痛
無奈伤痛 2021-01-27 03:51

I\'m trying to understand the combine transformer in a apache beam pipeline.

Considering the following example pipeline:

def test_combine(data):
    loggin         


        
相关标签:
1条回答
  • 2021-01-27 04:41

    It looks it's happening due to the MapReduce structure. When using Combiners, the output that one combiner has is used as a input.

    As an example, imagine summing 3 numbers (1, 2, 3). The combiner MAY sum first 1 and 2 (3) and use that number as input with 3 (3 + 3 = 6). In your case [1, 2, 3] seems to be used as an input in the next combiner.

    An example that really helped me understand this:

    p = beam.Pipeline()
    
    def make_list(elements):
        print(elements)
        return elements
    
    (p | Create(range(30))
       | beam.core.CombineGlobally(make_list))
    
    p.run()
    

    See that the element [1,..,10] is used in the next combiner.

    0 讨论(0)
提交回复
热议问题