Disable parquet metadata summary in Spark

前端 未结 2 407
既然无缘
既然无缘 2021-01-02 10:30

I have a spark job (for 1.4.1) receiving a stream of kafka events. I would like to save them continuously as parquet on tachyon.

val lines = KafkaUtils.creat         


        
2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-02 11:27

    setting "parquet.enable.summary-metadata" as text ("false" and not false) seems to work for us.

    By the way Spark does use the _common_metadata file (we copy that over manually for repetitive jobs)

提交回复
热议问题