I have a large data frame df containing a column for date in the format yyyymmdd, how can I convert it into MM-dd-yyyy in pySpark.
yyyymmdd
MM-dd-yyyy
This is also working:
from datetime import datetime from pyspark.sql.functions import col,udf,unix_timestamp from pyspark.sql.types import DateType func = udf(lambda x: datetime.strptime(str(x), '%m%d%y'), DateType()) df2 = df.withColumn('date', func(col('InvcDate')))