发表新帖

发表新帖

How to overwrite entire existing column in Spark dataframe with new column?

前端未结

关注

 3  1437

抹茶落季 2021-02-20 04:19

I want to overwrite a spark column with a new column which is a binary flag.

I tried directly overwriting the column id2 but why is it not working like a inplace operati

3条回答

北海茫月 (楼主)

2021-02-20 05:22

If you're working with multiple columns of the same name in different joined tables you can use the table alias in the colName in withColumn.

Eg. df1.join(df2, df1.id = df2.other_id).withColumn('df1.my_col', F.greatest(df1.my_col, df2.my_col))

And if you only want to keep the columns from df1 you can also call .select('df1.*')

If you instead do df1.join(df2, df1.id = df2.other_id).withColumn('my_col', F.greatest(df1.my_col, df2.my_col))

I think it overwrites the last column which is called my_col. So it outputs: id, my_col (df1.my_col original value), id, other_id, my_col (newly computed my_col)

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题