Writing custom condition inside .withColumn in Pyspark

巧了我就是萌 提交于 2020-12-15 03:39:51

问题


I have to add a customized condition, which has many columns in .withColumn. My scenario is somewhat like this. I have to check many columns row wise if they have Null values, and add those column names to a new column. My code looks somewhat like this:

df= df.withColumn("MissingColumns",\
    array(\
    when(col("firstName").isNull(),lit("firstName")),\
    when(col("salary").isNull(),lit("salary"))))

Problem is I have many columns which I have to add to the condition. So I tried to customize it using loops and f-strings and tried using that.

df = df.withColumn("MissingColumns",condition)

But this condition is not working. May be because, the condition I have written is of data type String. Is there any efficient way to do this?


回答1:


You need to unpack your list inside the array as follows:

columns = ["firstName","salary"]
condition = array(*[when(col(c).isNull(),lit(c)) for c in columns])


来源:https://stackoverflow.com/questions/64970694/writing-custom-condition-inside-withcolumn-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!