I\'m new to SparkSQL/Scala and I\'m struggling with a couple seemingly simple tasks.
I\'m trying to build some dynamic SQL from a Scala String Array. I\'m trying t
You can just use variadic arguments:
val df = Seq(("a", "1", "c"), ("foo", "bar", "baz")).toDF("a", "b", "c")
val typedCols = Array("a", "cast(b as int) b", "c")
df.selectExpr(typedCols: _*).show
+---+----+---+
| a| b| c|
+---+----+---+
| a| 1| c|
|foo|null|baz|
+---+----+---+
but personally I prefer columns:
val typedCols = Array($"a", $"b" cast "int", $"c")
df.select(typedCols: _*).show
How would I get a DataFrame result with all the good records that passed the typing and then throw all the bad records in some kind of error bucket?
Data that failed to cast
is NULL
. To find good records use na.drop
:
val result = df.selectExpr(typedCols: _*)
val good = result.na.drop()
To find bad check if any is NULL
import org.apache.spark.sql.functions.col
val bad = result.where(result.columns.map(col(_).isNull).reduce(_ || _))
To get unmatched data:
If typedCols
are Seq[Column]
you can
df.where(typedCols.map(_.isNull).reduce(_ || _))
If typedCols
are Seq[String]
you can:
import org.apache.spark.sql.functions.expr
df.where(typedCols.map(expr(_).isNull).reduce(_ || _))