Can anyone explain about the number of partitions that will be created for a Spark Dataframe.
I know that for a RDD, while creating it we can mention the number of part
You cannot, or at least not in a general case but it is not that different compared to RDD. For example textFile
example code you've provides sets only a limit on the minimum number of partitions.
In general:
Datasets
generated locally using methods like range
or toDF
on local collection will use spark.default.parallelism
.Datasets
created from RDD
inherit number of partitions from its parent.Datsets
created using data source API: