问题
If I have a DataFrame called df that looks like:
+----+----+
| a1+ a2|
+----+----+
| foo| bar|
| N/A| baz|
|null| etc|
+----+----+
I can selectively replace values like so:
val df2 = df.withColumn("a1", when($"a1" === "N/A", $"a2"))
so that df2 looks like:
+----+----+
| a1+ a2|
+----+----+
| foo| bar|
| baz| baz|
|null| etc|
+----+----+
but why can't I check if it's null, like:
val df3 = df2.withColumn("a1", when($"a1" === null, $"a2"))
so that I get:
+----+----+
| a1+ a2|
+----+----+
| foo| bar|
| baz| baz|
| etc| etc|
+----+----+
Edit: $"a1".isNull doesn't seem to work. Could it be because of how I'm constructing the dataframe I'm using to test, which is as follows?
val schema = StructType(
StructField("a1", StringType, false) ::
StructField("a2", StringType, false) :: Nil
)
val data = sc.parallelize(Array(
Row("foo","bar"),
Row("N/A","baz"),
Row(null,"etc"))
)
val df = sqlContext.createDataFrame(data, schema)
I also can't use coalesce, as far as I know, because sometimes I need to use a static value instead of another column's value.
Edit again: Setting my test columns to nullable = false doesn't help.
回答1:
Because null means no value and shouldn't be checked like this.
Use isNull
function:
val df3 = df2.withColumn("a1", when($"a1".isNull, $"a2"))
or coalesce
, which returns first non-null value:
val df3 = df2.withColumn("a1", coalesce($"a1", $"a2"))
来源:https://stackoverflow.com/questions/43853949/in-dataframe-withcolumn-how-can-i-check-if-the-columns-value-is-null-as-a-cond