How to overwrite entire existing column in Spark dataframe with new column?

前端 未结 3 1437
抹茶落季
抹茶落季 2021-02-20 04:19

I want to overwrite a spark column with a new column which is a binary flag.

I tried directly overwriting the column id2 but why is it not working like a inplace operati

3条回答
  •  北海茫月
    2021-02-20 05:22

    If you're working with multiple columns of the same name in different joined tables you can use the table alias in the colName in withColumn.

    Eg. df1.join(df2, df1.id = df2.other_id).withColumn('df1.my_col', F.greatest(df1.my_col, df2.my_col))

    And if you only want to keep the columns from df1 you can also call .select('df1.*')

    If you instead do df1.join(df2, df1.id = df2.other_id).withColumn('my_col', F.greatest(df1.my_col, df2.my_col))

    I think it overwrites the last column which is called my_col. So it outputs: id, my_col (df1.my_col original value), id, other_id, my_col (newly computed my_col)

提交回复
热议问题