Converting yyyymmdd to MM-dd-yyyy format in pyspark

前端 未结 2 1667
情深已故
情深已故 2021-01-15 04:49

I have a large data frame df containing a column for date in the format yyyymmdd, how can I convert it into MM-dd-yyyy in pySpark.

相关标签:
2条回答
  • 2021-01-15 05:04
    from datetime import datetime
    from pyspark.sql.functions import col,udf
    from pyspark.sql.types import DateType
    
    
    rdd = sc.parallelize(['20161231', '20140102', '20151201', '20161124'])
    df1 = sqlContext.createDataFrame(rdd, ['old_col'])
    
    # UDF to convert string to date
    func =  udf (lambda x: datetime.strptime(x, '%Y%m%d'), DateType())
    
    df = df1.withColumn('new_col', date_format(func(col('old_col')), 'MM-dd-yyy'))
    
    df.show()
    
    0 讨论(0)
  • 2021-01-15 05:07

    This is also working:

    from datetime import datetime
    from pyspark.sql.functions import col,udf,unix_timestamp
    from pyspark.sql.types import DateType
    
    
    func =  udf(lambda x: datetime.strptime(str(x), '%m%d%y'), DateType())
    
    df2 = df.withColumn('date', func(col('InvcDate')))
    
    0 讨论(0)
提交回复
热议问题