Im using pyspark and I have a large data source that I want to repartition specifying the files size per partition explicitly.
I know using the repartition(500)
repartition(500)