When decreasing the number of partitions one can use coalesce
, which is great because it doesn\'t cause a shuffle and seems to work instantly (doesn\'t require an a
I do not exactly understand what your point is. Do you mean you have now 5 partitions, but after next operation you want data distributed to 10? Because having 10, but still using 5 does not make much sense… The process of sending data to new partitions has to happen sometime.
When doing coalesce
, you can get rid of unsued partitions, for example: if you had initially 100, but then after reduceByKey you got 10 (as there where only 10 keys), you can set coalesce
.
If you want the process to go the other way, you could just force some kind of partitioning:
[RDD].partitionBy(new HashPartitioner(100))
I'm not sure that's what you're looking for, but hope so.