How to convert empty arrays to nulls?

后端 未结 7 458
Happy的楠姐
Happy的楠姐 2021-01-13 18:28

I have below dataframe and i need to convert empty arrays to null.

+----+---------+-----------+
|  id|count(AS)|count(asdr)|
+----+---------+-----------+
|11         


        
7条回答
  •  抹茶落季
    2021-01-13 18:59

    I don't think thats possible with na.fill, but this should work for you. The code converts all empty ArrayType-columns to null and keeps the other columns as they are:

    import spark.implicits._
    import org.apache.spark.sql.types.ArrayType
    import org.apache.spark.sql.functions._
    
    val df = Seq(
      (110, Seq.empty[Int]),
      (111, Seq(1,2,3))
    ).toDF("id","arr")
    
    // get names of array-type columns
    val arrColsNames = df.schema.fields.filter(f => f.dataType.isInstanceOf[ArrayType]).map(_.name)
    
    // map all empty arrays to nulls
    val emptyArraysAsNulls = arrColsNames.map(n => when(size(col(n))>0,col(n)).as(n))
    
    // non-array-type columns, keep them as they are
    val keepCols = df.columns.filterNot(arrColsNames.contains).map(col)
    
    df
      .select((keepCols ++ emptyArraysAsNulls):_*)
      .show()
    
    +---+---------+
    | id|      arr|
    +---+---------+
    |110|     null|
    |111|[1, 2, 3]|
    +---+---------+
    

提交回复
热议问题