I have a PySpark dataframe, where each user has a certain status at a point in time, as in the dummy data below
-------------------------- |user_id| s