Can unix_timestamp() return unix time in milliseconds in Apache Spark?

前端 未结 4 1650
情深已故
情深已故 2021-02-12 20:17

I\'m trying to get the unix time from a timestamp field in milliseconds (13 digits) but currently it returns in seconds (10 digits).

scala> var df = Seq(\"20         


        
4条回答
  •  离开以前
    2021-02-12 21:00

    Implementing the approach suggested in Dao Thi's answer

    import pyspark.sql.functions as F
    df = spark.createDataFrame([('22-Jul-2018 04:21:18.792 UTC', ),('23-Jul-2018 04:21:25.888 UTC',)], ['TIME'])
    df.show(2,False)
    df.printSchema()
    

    Output:

    +----------------------------+
    |TIME                        |
    +----------------------------+
    |22-Jul-2018 04:21:18.792 UTC|
    |23-Jul-2018 04:21:25.888 UTC|
    +----------------------------+
    root
    |-- TIME: string (nullable = true)
    

    Converting string time-format (including milliseconds ) to unix_timestamp(double). Extracting milliseconds from string using substring method (start_position = -7, length_of_substring=3) and Adding milliseconds seperately to unix_timestamp. (Cast to substring to float for adding)

    df1 = df.withColumn("unix_timestamp",F.unix_timestamp(df.TIME,'dd-MMM-yyyy HH:mm:ss.SSS z') + F.substring(df.TIME,-7,3).cast('float')/1000)
    

    Converting unix_timestamp(double) to timestamp datatype in Spark.

    df2 = df1.withColumn("TimestampType",F.to_timestamp(df1["unix_timestamp"]))
    df2.show(n=2,truncate=False)
    

    This will give you following output

    +----------------------------+----------------+-----------------------+
    |TIME                        |unix_timestamp  |TimestampType          |
    +----------------------------+----------------+-----------------------+
    |22-Jul-2018 04:21:18.792 UTC|1.532233278792E9|2018-07-22 04:21:18.792|
    |23-Jul-2018 04:21:25.888 UTC|1.532319685888E9|2018-07-23 04:21:25.888|
    +----------------------------+----------------+-----------------------+
    

    Checking the Schema:

    df2.printSchema()
    
    
    root
     |-- TIME: string (nullable = true)
     |-- unix_timestamp: double (nullable = true)
     |-- TimestampType: timestamp (nullable = true)
    

提交回复
热议问题