my question is how to split a column to multiple columns.
I don\'t know why df.toPandas()
does not work.
For example, I would like to change \'df_test\'
The Solution here is to use pyspark.sql.functions.split() function.
df = sqlContext.createDataFrame([
(1, '14-Jul-15'),
(2, '14-Jun-15'),
(3, '11-Oct-15'),
], ('id', 'date'))
split_col = pyspark.sql.functions.split(df['date'], '-')
df = df.withColumn('day', split_col.getItem(0))
df = df.withColumn('month', split_col.getItem(1))
df = df.withColumn('year', split_col.getItem(2))
df = df.drop("date")