Renaming nested elements in Scala Spark Dataframe

后端 未结 1 996
后悔当初
后悔当初 2021-01-17 02:54

I have a Spark Scala dataframe with a nested structure:

 |-- _History: struct (nullable = true)
 |    |-- Article:          


        
1条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-17 03:20

    The simplest approach is to use type casting with properly named schema string (or equivalent StructField definition):

    val schema = """struct<
      Article: array>,
      Channel: struct>>"""
    df.withColumn("_History", $"_History".cast(schema))
    

    You could also model this with case classes:

    import org.apache.spark.sql.Row
    
    case class ChannelRecord(Cultura: Option[Long], Deoprtes: Option[Seq[Long]])
    
    val rename = udf((row: Row) => 
      ChannelRecord(Option(row.getLong(0)), Option(row.getSeq[Long](1))))
    
    df.withColumn("_History",
      struct($"_History.Article", rename($"_History.channel").alias("channel")))
    

    0 讨论(0)
提交回复
热议问题