问题
I have to add a customized condition, which has many columns in .withColumn. My scenario is somewhat like this. I have to check many columns row wise if they have Null values, and add those column names to a new column. My code looks somewhat like this:
df= df.withColumn("MissingColumns",\
array(\
when(col("firstName").isNull(),lit("firstName")),\
when(col("salary").isNull(),lit("salary"))))
Problem is I have many columns which I have to add to the condition. So I tried to customize it using loops and f-strings and tried using that.
df = df.withColumn("MissingColumns",condition)
But this condition is not working. May be because, the condition I have written is of data type String. Is there any efficient way to do this?
回答1:
You need to unpack your list inside the array as follows:
columns = ["firstName","salary"]
condition = array(*[when(col(c).isNull(),lit(c)) for c in columns])
来源:https://stackoverflow.com/questions/64970694/writing-custom-condition-inside-withcolumn-in-pyspark