I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy
and I am attempting to convert this into a date column.
I tried:
possibly not so many answers so thinking to share my code which can help someone
from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date
spark = SparkSession.builder.appName("Python Spark SQL basic example")\
.config("spark.some.config.option", "some-value").getOrCreate()
df = spark.createDataFrame([('2019-06-22',)], ['t'])
df1 = df.select(to_date(df.t, 'yyyy-MM-dd').alias('dt'))
print df1
print df1.show()
output
DataFrame[dt: date]
+----------+
| dt|
+----------+
|2019-06-22|
+----------+
the above code to convert to date if you want to convert datetime then use to_timestamp. let me know if you have any doubt.