duplicating records between date gaps within a selected time interval in a PySpark dataframe
- 阅读更多 关于 duplicating records between date gaps within a selected time interval in a PySpark dataframe
问题 I have a PySpark dataframe that keeps track of changes that occur in a product's price and status over months. This means that a new row is created only when a change occurred (in either status or price) compared to the previous month, like in the dummy data below ---------------------------------------- |product_id| status | price| month | ---------------------------------------- |1 | available | 5 | 2019-10| ---------------------------------------- |1 | available | 8 | 2020-08| ------------