I have a dataframe whose schema looks like this:
event: struct (nullable = true)
| | event_category: string (nullable = true)
| | event_name: string (n
as the error message says, you can only explode array or map types, not struct type columns.
You can just do
df_json.withColumn("event_properties", $"event.properties")
This will generate a new column event_properties
, which is also of struct-type
If you want to convert every element of the struct to a new column, then you cannot use withColumn
, you need to do a select
with a wildcard *
:
df_json.select($"event.properties.*")
You may use following to flatten the struct. Explode does not work for struct as error message states.
val explodeDF = parquetDF.explode($"event") {
case Row(properties: Seq[Row]) => properties.map{ property =>
val errorCode = property(0).asInstanceOf[String]
val errorDescription = property(1).asInstanceOf[String]
Event(errorCode, errorDescription, email, salary)
}
}.cache()
display(explodeDF)
You can use explode
in an array
or map
columns so you need to convert the properties
struct
to array
and then apply the explode
function as below
import org.apache.spark.sql.functions._
df_json.withColumn("event_properties", explode(array($"event.properties.*"))).show(false)
You should have your desired requirement