Create new Dataframe with empty/null field values

后端 未结 2 415
忘了有多久
忘了有多久 2020-11-29 01:55

I am creating a new Dataframe from an existing dataframe, but need to add new column (\"field1\" in below code) in this new DF. How do I do so? Working sample code example w

相关标签:
2条回答
  • 2020-11-29 02:24

    It is possible to use lit(null):

    import org.apache.spark.sql.functions.{lit, udf}
    
    case class Record(foo: Int, bar: String)
    val df = Seq(Record(1, "foo"), Record(2, "bar")).toDF
    
    val dfWithFoobar = df.withColumn("foobar", lit(null: String))
    

    One problem here is that the column type is null:

    scala> dfWithFoobar.printSchema
    root
     |-- foo: integer (nullable = false)
     |-- bar: string (nullable = true)
     |-- foobar: null (nullable = true)
    

    and it is not retained by the csv writer. If it is a hard requirement you can cast column to the specific type (lets say String), with either DataType

    import org.apache.spark.sql.types.StringType
    
    df.withColumn("foobar", lit(null).cast(StringType))
    

    or string description

    df.withColumn("foobar", lit(null).cast("string"))
    

    or use an UDF like this:

    val getNull = udf(() => None: Option[String]) // Or some other type
    
    df.withColumn("foobar", getNull()).printSchema
    root
     |-- foo: integer (nullable = false)
     |-- bar: string (nullable = true)
     |-- foobar: string (nullable = true)
    

    A Python equivalent can be found here: Add an empty column to spark DataFrame

    0 讨论(0)
  • 2020-11-29 02:46

    Just to extend the perfect answer provided by @zero323, here's a solution which can be used starting from Spark 2.2.0.

    import org.apache.spark.sql.functions.typedLit
    
    df.withColumn("foobar", typedLit[Option[String]](None)).printSchema
    root
     |-- foo: integer (nullable = false)
     |-- bar: string (nullable = true)
     |-- foobar: string (nullable = true)
    
    

    It's similar to the 3rd solution, but without using any UDF.

    0 讨论(0)
提交回复
热议问题