How to modify a Spark Dataframe with a complex nested structure?

前端 未结 2 471
余生分开走
余生分开走 2020-12-30 09:58

I\'ve a complex DataFrame structure and would like to null a column easily. I\'ve created implicit classes that wire functionality and easily address 2D DataFrame structure

相关标签:
2条回答
  • 2020-12-30 10:20

    I ran into the same issue and assuming you don't need the result to have any new fields or fields with different types, here is a solution that can do this without having to redefine the whole struct: Change value of nested column in DataFrame

    0 讨论(0)
  • 2020-12-30 10:24

    Since Spark 1.6, you can use case classes to map your dataframes (called datasets). Then, you can map your data and transform it to the new schema you want. For example:

    case class Root(name: String, data: Seq[Data])
    case class Data(name: String, values: Map[String, String])
    case class NullableRoot(name: String, data: Seq[NullableData])
    case class NullableData(name: String, value: Map[String, String], values: Map[String, String])
    
    val nullableDF = df.as[Root].map { root =>
      val nullableData = root.data.map(data => NullableData(data.name, null, data.values))
      NullableRoot(root.name, nullableData)
    }.toDF()
    

    The resulting schema of nullableDF will be:

    root
     |-- name: string (nullable = true)
     |-- data: array (nullable = true)
     |    |-- element: struct (containsNull = true)
     |    |    |-- name: string (nullable = true)
     |    |    |-- value: map (nullable = true)
     |    |    |    |-- key: string
     |    |    |    |-- value: string (valueContainsNull = true)
     |    |    |-- values: map (nullable = true)
     |    |    |    |-- key: string
     |    |    |    |-- value: string (valueContainsNull = true)
    
    0 讨论(0)
提交回复
热议问题