I have a bit of a question around PySpark.
After aggregating, I have really skewed data (some partitions are massive).
If I repartition; it takes ages, as the data