How to add a column to Dataset without converting from a DataFrame and accessing it?

后端 未结 2 1506
野性不改
野性不改 2021-02-14 06:56

I am aware of method to add a new column to a Spark DataSet using .withColumn() and a UDF, which returns a DataFrame. I am also aware that, we can conv

2条回答
  •  野性不改
    2021-02-14 07:04

    In the type-safe world of Datasets you'd map an structure into another.

    That is, for each transformation, we need schema representations of the data (as it is needed for RDDs). To access 'c' above, we need to create a new schema that provides access to it.

    case class A(a:String)
    case class BC(b:String, c:String)
    val f:A => BC = a=> BC(a.a,"c") // Transforms an A into a BC
    
    val data = (1 to 10).map(i => A(i.toString))
    val dsa = spark.createDataset(data)
    // dsa: org.apache.spark.sql.Dataset[A] = [a: string]
    
    val dsb = dsa.map(f)
    //dsb: org.apache.spark.sql.Dataset[BC] = [b: string, c: string]
    

提交回复
热议问题