Convert pyspark string to date format

后端 未结 6 1745
礼貌的吻别
礼貌的吻别 2020-11-22 05:21

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to convert this into a date column.

I tried:

6条回答
  •  鱼传尺愫
    2020-11-22 06:07

    from datetime import datetime
    from pyspark.sql.functions import col, udf
    from pyspark.sql.types import DateType
    
    
    
    # Creation of a dummy dataframe:
    df1 = sqlContext.createDataFrame([("11/25/1991","11/24/1991","11/30/1991"), 
                                ("11/25/1391","11/24/1992","11/30/1992")], schema=['first', 'second', 'third'])
    
    # Setting an user define function:
    # This function converts the string cell into a date:
    func =  udf (lambda x: datetime.strptime(x, '%m/%d/%Y'), DateType())
    
    df = df1.withColumn('test', func(col('first')))
    
    df.show()
    
    df.printSchema()
    

    Here is the output:

    +----------+----------+----------+----------+
    |     first|    second|     third|      test|
    +----------+----------+----------+----------+
    |11/25/1991|11/24/1991|11/30/1991|1991-01-25|
    |11/25/1391|11/24/1992|11/30/1992|1391-01-17|
    +----------+----------+----------+----------+
    
    root
     |-- first: string (nullable = true)
     |-- second: string (nullable = true)
     |-- third: string (nullable = true)
     |-- test: date (nullable = true)
    

提交回复
热议问题