What is a glom?. How it is different from mapPartitions?

后端 未结 3 1214
慢半拍i
慢半拍i 2021-01-31 22:31

I\'ve come across the glom() method on RDD. As per the documentation

Return an RDD created by coalescing all elements within each partition

3条回答
  •  礼貌的吻别
    2021-01-31 22:39

    Does glom shuffle the data across partitions

    No, it doesn't

    If this is the second case I believe that the same can be achieved using mapPartitions

    It can:

    rdd.mapPartitions(iter => Iterator(_.toArray))
    

    but the same thing applies to any non shuffling transformation like map, flatMap or filter.

    if there are any use cases which benefit from glob.

    Any situation where you need to access partition data in a form that is traversable more than once.

提交回复
热议问题