pyspark split a column to multiple columns without pandas

前端 未结 2 999
时光取名叫无心
时光取名叫无心 2021-01-02 22:40

my question is how to split a column to multiple columns. I don\'t know why df.toPandas() does not work.

For example, I would like to change \'df_test\'

2条回答
  •  借酒劲吻你
    2021-01-02 23:00

    The Solution here is to use pyspark.sql.functions.split() function.

    df = sqlContext.createDataFrame([
    (1, '14-Jul-15'),
    (2, '14-Jun-15'),
    (3, '11-Oct-15'),
    ], ('id', 'date'))
    
    split_col = pyspark.sql.functions.split(df['date'], '-')
    df = df.withColumn('day', split_col.getItem(0))
    df = df.withColumn('month', split_col.getItem(1))
    df = df.withColumn('year', split_col.getItem(2))
    df = df.drop("date")
    

提交回复
热议问题