I have a dataframe in pyspark (I get it from reading in a partition with around 1.6 million rows, but often I read in multiple partitions).
For each week of data, the