How to calculate date difference in pyspark?

前端 未结 2 1382
忘了有多久
忘了有多久 2020-12-10 01:14

I have data like this:

df = sqlContext.createDataFrame([
    (\'1986/10/15\', \'z\', \'null\'), 
    (\'1986/10/15\', \'z\', \'null\'),
    (\'1986/10/15\'         


        
相关标签:
2条回答
  • 2020-12-10 01:46

    You need to cast the column low to class date and then you can use datediff() in combination with lit(). Using Spark 2.2:

    from pyspark.sql.functions import datediff, to_date, lit
    
    df.withColumn("test", 
                  datediff(to_date(lit("2017-05-02")),
                           to_date("low","yyyy/MM/dd"))).show()
    +----------+----+------+-----+
    |       low|high|normal| test|
    +----------+----+------+-----+
    |1986/10/15|   z|  null|11157|
    |1986/10/15|   z|  null|11157|
    |1986/10/15|   c|  null|11157|
    |1986/10/15|null|  null|11157|
    |1986/10/16|null|   4.0|11156|
    +----------+----+------+-----+
    

    Using < Spark 2.2, we need to convert the the low column to class timestamp first:

    from pyspark.sql.functions import datediff, to_date, lit, unix_timestamp
    
    df.withColumn("test", 
                  datediff(to_date(lit("2017-05-02")),
                           to_date(unix_timestamp('low', "yyyy/MM/dd").cast("timestamp")))).show()
    
    0 讨论(0)
  • 2020-12-10 01:51

    Alternatively, how to find the number of days passed between two subsequent user's actions using pySpark:

    import pyspark.sql.functions as funcs
    from pyspark.sql.window import Window
    
    window = Window.partitionBy('user_id').orderBy('action_date')
    
    df = df.withColumn("days_passed", funcs.datediff(df.action_date, 
                                      funcs.lag(df.action_date, 1).over(window)))
    
    
    
    +----------+-----------+-----------+
    |   user_id|action_date|days_passed| 
    +----------+-----------+-----------+
    |623       |2015-10-21|        null|
    |623       |2015-11-19|          29|
    |623       |2016-01-13|          59|
    |623       |2016-01-21|           8|
    |623       |2016-03-24|          63|
    +----------+----------+------------+
    
    0 讨论(0)
提交回复
热议问题