Convert pyspark string to date format

后端 未结 6 1746
礼貌的吻别
礼貌的吻别 2020-11-22 05:21

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column.

I tried:

6条回答
  •  慢半拍i
    慢半拍i (楼主)
    2020-11-22 05:58

    Update (1/10/2018):

    For Spark 2.2+ the best way to do this is probably using the to_date or to_timestamp functions, which both support the format argument. From the docs:

    >>> from pyspark.sql.functions import to_timestamp
    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    >>> df.select(to_timestamp(df.t, 'yyyy-MM-dd HH:mm:ss').alias('dt')).collect()
    [Row(dt=datetime.datetime(1997, 2, 28, 10, 30))]
    

    Original Answer (for Spark < 2.2)

    It is possible (preferrable?) to do this without a udf:

    from pyspark.sql.functions import unix_timestamp, from_unixtime
    
    df = spark.createDataFrame(
        [("11/25/1991",), ("11/24/1991",), ("11/30/1991",)], 
        ['date_str']
    )
    
    df2 = df.select(
        'date_str', 
        from_unixtime(unix_timestamp('date_str', 'MM/dd/yyy')).alias('date')
    )
    
    print(df2)
    #DataFrame[date_str: string, date: timestamp]
    
    df2.show(truncate=False)
    #+----------+-------------------+
    #|date_str  |date               |
    #+----------+-------------------+
    #|11/25/1991|1991-11-25 00:00:00|
    #|11/24/1991|1991-11-24 00:00:00|
    #|11/30/1991|1991-11-30 00:00:00|
    #+----------+-------------------+
    

提交回复
热议问题