Number of Partitions of Spark Dataframe

前端 未结 1 675
青春惊慌失措
青春惊慌失措 2021-02-06 08:06

Can anyone explain about the number of partitions that will be created for a Spark Dataframe.

I know that for a RDD, while creating it we can mention the number of part

1条回答
  •  日久生厌
    2021-02-06 08:22

    You cannot, or at least not in a general case but it is not that different compared to RDD. For example textFile example code you've provides sets only a limit on the minimum number of partitions.

    In general:

    • Datasets generated locally using methods like range or toDF on local collection will use spark.default.parallelism.
    • Datasets created from RDD inherit number of partitions from its parent.
    • Datsets created using data source API:

      • In Spark 1.x typically depends on the Hadoop configuration (min / max split size).
      • In Spark 2.x there is a Spark SQL specific configuration in use.
    • Some data sources may provide additional options which give more control over partitioning. For example JDBC source allows you to set partitioning column, values range and desired number of partitions.

    0 讨论(0)
提交回复
热议问题