I am trying to implement the following logic from python pandas code in pyspark: I am looking for some idea on how this can be efficiently implemented in spark.
I have a