Spark Exception Complex types not supported while loading parquet

前端 未结 1 1152
南笙
南笙 2021-01-24 02:19

I am trying to load Parquet File in Spark as dataframe-

val df= spark.read.parquet(path)

I am getting -

org.apache.spark.SparkE         


        
相关标签:
1条回答
  • 2021-01-24 02:53

    Take 1
    SPARK-12854 Vectorize Parquet reader indicates that "ColumnarBatch supports structs and arrays" (cf. GitHub pull request 10820) starting with Spark 2.0.0

    And SPARK-13518 Enable vectorized parquet reader by default, also starting with Spark 2.0.0, deals with property spark.sql.parquet.enableVectorizedReader (cf. GitHub commit e809074)

    My 2 cents: disable that "VectorizedReader" optimization and see what happens.

    Take 2
    Since the problem has been narrowed down to some empty files that do not display the same schema as "real" files, my 3 cents: experiment with spark.sql.parquet.mergeSchema to see if the schema from real files takes precedence after merging.

    Other than that, you might try to eradicate the empty files at write time, with some kind of re-partitioning e.g. coalesce(1) (OK, 1 is a bit caricatural, but you see the point).

    0 讨论(0)
提交回复
热议问题