Converting yyyymmdd to MM-dd-yyyy format in pyspark

前端 未结 2 1669
情深已故
情深已故 2021-01-15 04:49

I have a large data frame df containing a column for date in the format yyyymmdd, how can I convert it into MM-dd-yyyy in pySpark.

2条回答
  •  一向
    一向 (楼主)
    2021-01-15 05:04

    from datetime import datetime
    from pyspark.sql.functions import col,udf
    from pyspark.sql.types import DateType
    
    
    rdd = sc.parallelize(['20161231', '20140102', '20151201', '20161124'])
    df1 = sqlContext.createDataFrame(rdd, ['old_col'])
    
    # UDF to convert string to date
    func =  udf (lambda x: datetime.strptime(x, '%Y%m%d'), DateType())
    
    df = df1.withColumn('new_col', date_format(func(col('old_col')), 'MM-dd-yyy'))
    
    df.show()
    

提交回复
热议问题