Spark: increase number of partitions without causing a shuffle?

后端未结

关注

 3  1513

夕颜 2021-02-07 03:22

When decreasing the number of partitions one can use coalesce, which is great because it doesn\'t cause a shuffle and seems to work instantly (doesn\'t require an a

3条回答

走了就别回头了 (楼主)

2021-02-07 03:36
I do not exactly understand what your point is. Do you mean you have now 5 partitions, but after next operation you want data distributed to 10? Because having 10, but still using 5 does not make much sense… The process of sending data to new partitions has to happen sometime.

When doing coalesce, you can get rid of unsued partitions, for example: if you had initially 100, but then after reduceByKey you got 10 (as there where only 10 keys), you can set coalesce.

If you want the process to go the other way, you could just force some kind of partitioning:
```
[RDD].partitionBy(new HashPartitioner(100))
```
I'm not sure that's what you're looking for, but hope so.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...