Hadoop ORC file - How it works - How to fetch metadata

前端 未结 2 762
暖寄归人
暖寄归人 2021-02-10 03:08

I am new to ORC file. I went through many blogs, but didn\'t get clear understanding. Please help and clarify below questions.

  1. Can I fetch schema from ORC file?

2条回答
  •  隐瞒了意图╮
    2021-02-10 03:43

    Hey i can not help you with all of your questions but i'll give it a try

    1. you can use the filedump utility to read out the metadata of an ORC-file see here

    2. I am very unsure about the schema evolution but as far as i know ORC does not support evolution.

    3. ORC index stores sum min and max so if your data is totally unstructured you probably would still have to read a lot of data. But since the latest release of ORC you can anable an additional Bloom-Filter which is more accurate in row group elimination. Maybe this could be helpful too orc-user mailing list

    4. ORC provides an index for every column but it's just a light weight index. You store information about min/max and sum on numeric columns in the filefooter, stripefooter and by default every 10000 rows. so it does not take that much space

    5. If you store your table in Orc Fileformat Hive will use an specific ORC Recordreader to extract the rows from the columns. The advantage of columnar storage is that you do not have to read the whole row

提交回复
热议问题