Could anyone please explain what is c000 means in c000.snappy.parquet or c000.snappy.orc??

倾然丶 夕夏残阳落幕 提交于 2021-02-07 20:30:05

问题


I have searched through every documentation and still didn't find why there is a prefix and what is c000 in the below file naming convention:

file:/Users/stephen/p/spark/f1/part-00000-445036f9-7a40-4333-8405-8451faa44319- c000.snappy.parquet


回答1:


You should use "Talk is cheap, show me the code." methodology. Everything is not documented and one way to go is just the code.

Consider part-1-2_3-4.parquet :

  1. Split/Partition number.

  2. Random UUID to prevent collision between different (appending) write jobs.

  3. Unique Job/Task ID (sometimes it will not be included).
  4. The "c" stands for count. This is file counter which means the number of files that have been written in the past for this specific partition. This is used to limit the max number of records written for a single file. The value should start from 0.

I found it based on this code and this code.



来源:https://stackoverflow.com/questions/49165696/could-anyone-please-explain-what-is-c000-means-in-c000-snappy-parquet-or-c000-sn

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!