I get confusing messages when searching and reading answers on the internet on this subject. Anyone can share their experience? I know for a fact that gzipped csv is not, but ma
Parquet files with GZIP compression are actually splittable. This is because of the internal layout of Parquet files. These are always splittable, independent of the used compression algorithm.
This fact is mainly due to the design of Parquet files that divided in the following parts:
You can find a more detailed explanation here: https://github.com/apache/parquet-format#file-format