I have below dataframe and i need to convert empty arrays to null.
+----+---------+-----------+
| id|count(AS)|count(asdr)|
+----+---------+-----------+
|11
I don't think thats possible with na.fill
, but this should work for you. The code converts all empty ArrayType-columns to null and keeps the other columns as they are:
import spark.implicits._
import org.apache.spark.sql.types.ArrayType
import org.apache.spark.sql.functions._
val df = Seq(
(110, Seq.empty[Int]),
(111, Seq(1,2,3))
).toDF("id","arr")
// get names of array-type columns
val arrColsNames = df.schema.fields.filter(f => f.dataType.isInstanceOf[ArrayType]).map(_.name)
// map all empty arrays to nulls
val emptyArraysAsNulls = arrColsNames.map(n => when(size(col(n))>0,col(n)).as(n))
// non-array-type columns, keep them as they are
val keepCols = df.columns.filterNot(arrColsNames.contains).map(col)
df
.select((keepCols ++ emptyArraysAsNulls):_*)
.show()
+---+---------+
| id| arr|
+---+---------+
|110| null|
|111|[1, 2, 3]|
+---+---------+