How to add a column to Dataset without converting from a DataFrame and accessing it?

后端未结

关注

 2  1515

野性不改 2021-02-14 06:56

I am aware of method to add a new column to a Spark DataSet using .withColumn() and a UDF, which returns a DataFrame. I am also aware that, we can conv

2条回答

野性不改 (楼主)

2021-02-14 07:04
In the type-safe world of Datasets you'd map an structure into another.

That is, for each transformation, we need schema representations of the data (as it is needed for RDDs). To access 'c' above, we need to create a new schema that provides access to it.
```
case class A(a:String)
case class BC(b:String, c:String)
val f:A => BC = a=> BC(a.a,"c") // Transforms an A into a BC

val data = (1 to 10).map(i => A(i.toString))
val dsa = spark.createDataset(data)
// dsa: org.apache.spark.sql.Dataset[A] = [a: string]

val dsb = dsa.map(f)
//dsb: org.apache.spark.sql.Dataset[BC] = [b: string, c: string]
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...