If we have a Pandas data frame consisting of a column of categories and a column of values, we can remove the mean in each category by doing the following:
df[\"
You can use Window to do this
Window
i.e.
import pyspark.sql.functions as F from pyspark.sql.window import Window window_var = Window().partitionBy('Categroy') df = df.withColumn('DemeanedValues', F.col('Values') - F.mean('Values').over(window_var))