发表新帖

发表新帖

How does partitioning work in Spark?

后端未结

关注

 1  1958

遇见更好的自我

I\'m trying to understand how partitioning is done in Apache Spark. Can you guys help please?

Here is the scenario:

a master and two nodes with 1 core

相关标签:

1条回答

耶瑟儿～

2020-12-05 19:05

By default a partition is created for each HDFS partition, which by default is 64MB (from the Spark Programming Guide).

It's possible to pass another parameter defaultMinPartitions which overrides the minimum number of partitions that spark will create. If you don't override this value then spark will create at least as many partitions as spark.default.parallelism.

Since spark.default.parallelism is supposed to be the number of cores across all of the machines in your cluster I believe that there would be at least 3 partitions created in your case.

You can also repartition or coalesce an RDD to change the number of partitions that in turn influences the total amount of available parallelism.

0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题