Pyspark changing type of column from date to string

后端 未结 1 784
一个人的身影
一个人的身影 2020-12-06 06:57

I have the following dataframe:

corr_temp_df
[(\'vacationdate\', \'date\'),
 (\'valueE\', \'string\'),
 (\'valueD\', \'string\'),
 (\'valueC\', \'string\'),
         


        
相关标签:
1条回答
  • 2020-12-06 07:51

    Lets create some dummy data:

    import datetime
    from pyspark.sql import Row
    from pyspark.sql.functions import col
    
    row = Row("vacationdate")
    
    df = sc.parallelize([
        row(datetime.date(2015, 10, 07)),
        row(datetime.date(1971, 01, 01))
    ]).toDF()
    

    If you Spark >= 1.5.0 you can use date_format function:

    from pyspark.sql.functions import date_format
    
    (df
       .select(date_format(col("vacationdate"), "dd-MM-YYYY")
       .alias("date_string"))
       .show())
    

    In Spark < 1.5.0 it can be done using Hive UDF:

    df.registerTempTable("df")
    sqlContext.sql(
        "SELECT date_format(vacationdate, 'dd-MM-YYYY') AS date_string FROM df")
    

    It is of course still available in Spark >= 1.5.0.

    If you don't use HiveContext you can mimic date_format using UDF:

    from pyspark.sql.functions import udf, lit
    my_date_format = udf(lambda d, fmt: d.strftime(fmt))
    
    df.select(
        my_date_format(col("vacationdate"), lit("%d-%m-%Y")).alias("date_string")
    ).show()
    

    Please note it is using C standard format not a Java simple date format

    0 讨论(0)
提交回复
热议问题