Spark - Window with recursion? - Conditionally propagating values across rows

后端 未结 1 1360
星月不相逢
星月不相逢 2021-01-16 06:14

I have the following dataframe showing the revenue of purchases.

+-------+--------+-------+
|user_id|visit_id|revenue|
+-------+--------+-------+
|      1|          


        
相关标签:
1条回答
  • 2021-01-16 07:14

    Window functions don't support recursion but it is not required here. This type of sesionization can be easily handled with cumulative sum:

    from pyspark.sql.functions import col, sum, when, lag
    from pyspark.sql.window import Window
    
    w = Window.partitionBy("user_id").orderBy("visit_id")
    purch_id = sum(lag(when(
        col("revenue") > 0, 1).otherwise(0), 
        1, 0
    ).over(w)).over(w) + 1
    
    df.withColumn("purch_id", purch_id).show()
    
    +-------+--------+-------+--------+
    |user_id|visit_id|revenue|purch_id|
    +-------+--------+-------+--------+
    |      1|       1|      0|       1|
    |      1|       2|      0|       1|
    |      1|       3|      0|       1|
    |      1|       4|    100|       1|
    |      1|       5|      0|       2|
    |      1|       6|      0|       2|
    |      1|       7|    200|       2|
    |      1|       8|      0|       3|
    |      1|       9|     10|       3|
    +-------+--------+-------+--------+
    
    0 讨论(0)
提交回复
热议问题