Does using Parquet on S3 with EMR/Spark save bandwidth when using subset of columns?

前端 未结 0 1258
耶瑟儿~
耶瑟儿~ 2020-12-21 21:37

I have an EMR cluster running Spark. In the first step the CSV files are transformed into paruqet.snappy format partitioned by date column, so I am left with

相关标签:
回答
  • 消灭零回复
提交回复
热议问题