Pyspark num of actual partitions in groupby vs shuffle partitions for Dataframe

前端未结

关注

 0  405

I have a Movielens CSV dataset file with columns as \'movieID\',\'UserID\', \'Rating\', \'Timestamp\'. I aggregated each movie rating by count and average. Below is my code.