Transforming a column and update the DataFrame

前端 未结 2 589
执念已碎
执念已碎 2021-01-25 10:58

So, what I\'m doing below is I drop a column A from a DataFrame because I want to apply a transformation (here I just json.loads a JSON st

2条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-01-25 11:36

    I do not think you need to drop the column and do the join. The following code should* be equivalent to what you posted:

    cols = df_data.columns
    df = df_data.rdd\
        .map(
            lambda row: tuple(
                [row[c] if c != 'A' else (json.loads(row[c]) if row[c] is not None else None) 
                 for c in cols]
            )
        )\
        .toDF(cols)
    

    *I haven't actually tested this code, but I think this should work.

    But to answer your general question, you can transform a column in-place using withColumn().

    df = df_data.withColumn("A", my_transformation_function("A").alias("A"))
    

    Where my_transformation_function() can be a udf or a pyspark sql function.

提交回复
热议问题