I\'ve come across the glom()
method on RDD. As per the documentation
Return an RDD created by coalescing all elements within each partition
Does
glom
shuffle the data across partitions
No, it doesn't
If this is the second case I believe that the same can be achieved using mapPartitions
It can:
rdd.mapPartitions(iter => Iterator(_.toArray))
but the same thing applies to any non shuffling transformation like map
, flatMap
or filter
.
if there are any use cases which benefit from glob.
Any situation where you need to access partition data in a form that is traversable more than once.