How to modify a Spark Dataframe with a complex nested structure?

前端未结

关注

 2  471

I\'ve a complex DataFrame structure and would like to null a column easily. I\'ve created implicit classes that wire functionality and easily address 2D DataFrame structure

相关标签:

2条回答

北海茫月

2020-12-30 10:20

I ran into the same issue and assuming you don't need the result to have any new fields or fields with different types, here is a solution that can do this without having to redefine the whole struct: Change value of nested column in DataFrame

0 讨论(0)
发布评论:

提交评论
- 加载中...

無奈伤痛

2020-12-30 10:24

Since Spark 1.6, you can use case classes to map your dataframes (called datasets). Then, you can map your data and transform it to the new schema you want. For example:

case class Root(name: String, data: Seq[Data])
case class Data(name: String, values: Map[String, String])
case class NullableRoot(name: String, data: Seq[NullableData])
case class NullableData(name: String, value: Map[String, String], values: Map[String, String])

val nullableDF = df.as[Root].map { root =>
  val nullableData = root.data.map(data => NullableData(data.name, null, data.values))
  NullableRoot(root.name, nullableData)
}.toDF()

The resulting schema of nullableDF will be:

root
 |-- name: string (nullable = true)
 |-- data: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- value: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)
 |    |    |-- values: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)

0 讨论(0)