Error while exploding a struct column in Spark

后端 未结 3 697
孤城傲影
孤城傲影 2021-01-17 16:30

I have a dataframe whose schema looks like this:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (n         


        
相关标签:
3条回答
  • 2021-01-17 16:40

    as the error message says, you can only explode array or map types, not struct type columns.

    You can just do

    df_json.withColumn("event_properties", $"event.properties")
    

    This will generate a new column event_properties, which is also of struct-type

    If you want to convert every element of the struct to a new column, then you cannot use withColumn, you need to do a select with a wildcard *:

    df_json.select($"event.properties.*")
    
    0 讨论(0)
  • You may use following to flatten the struct. Explode does not work for struct as error message states.

    val explodeDF = parquetDF.explode($"event") { 
    case Row(properties: Seq[Row]) => properties.map{ property =>
      val errorCode = property(0).asInstanceOf[String]
      val errorDescription = property(1).asInstanceOf[String]
      Event(errorCode, errorDescription, email, salary)
     }
    }.cache()
    display(explodeDF)
    
    0 讨论(0)
  • You can use explode in an array or map columns so you need to convert the properties struct to array and then apply the explode function as below

    import org.apache.spark.sql.functions._
    df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)
    

    You should have your desired requirement

    0 讨论(0)
提交回复
热议问题