Pyspark window function with condition

后端 未结 3 596
广开言路
广开言路 2021-02-08 23:13

Suppose I have a DataFrame of events with time difference between each row, the main rule is that one visit is counted if only the event has been within 5 minutes of the previ

3条回答
  •  失恋的感觉
    2021-02-08 23:45

    Approach can be grouping the dataframe based on your timeline criteria.

    You can create a dataframe with the rows breaking the 5 minutes timeline. Those rows are criteria for grouping the records and that rows will set the startime and endtime for each group.

    Then find the count and max timestamp(endtime) for each group.

提交回复
热议问题