问题
I am trying to use a "chained when" function. In other words, I'd like to get more than two outputs.
I tried using the same logic of the concatenate IF function in Excel:
df.withColumn("device_id", when(col("device")=="desktop",1)).otherwise(when(col("device")=="mobile",2)).otherwise(null))
But that doesn't work since I can't put a tuple into the "otherwise" function.
回答1:
Have you tried:
from pyspark.sql import functions as F
df.withColumn('device_id', F.when(col('device')=='desktop', 1).when(col('device')=='mobile', 2).otherwise(None))
Note that when chaining when
functions you do not need to wrap the successive calls in an otherwise
function.
来源:https://stackoverflow.com/questions/42537051/pyspark-when-function-with-multiple-outputs