问题
I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention:
file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet
回答1:
You should use "Talk is cheap, show me the code." methodology. Everything is not documented and one way to go is just the code.
Consider part-1-2_3-4.parquet :
Split/Partition number.
Random UUID to prevent collision between different (appending) write jobs.
- Unique Job/Task ID (sometimes it will not be included).
- The "c" stands for count. This is file counter which means the number of files that have been written in the past for this specific partition. This is used to limit the max number of records written for a single file. The value should start from 0.
I found it based on this code and this code.
来源:https://stackoverflow.com/questions/49165696/could-anyone-please-explain-what-is-c000-means-in-c000-snappy-parquet-or-c000-sn