I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby
e.g
You have several options:
Both options 2 & 3 would be relatively inefficient (costing both cpu and memory).