I am new to ORC file. I went through many blogs, but didn\'t get clear understanding. Please help and clarify below questions.
Can I fetch schema from ORC file?
1. and 2. Use Hive and/or HCatalog to create, read, update ORC table structure in the Hive metastore (HCatalog is just a side door than enables Pig/Sqoop/Spark/whatever to access the metastore directly)
2. ALTER TABLE
command allows to add/drop columns whatever the storage type, ORC included. But beware of a nasty bug that may crash vectorized reads after that (at least in V0.13 and V0.14)
3. and 4. The term "index" is rather inappropriate. Basically it's just min/max information persisted in the stripe footer at write time, then used at read time for skipping all stripes that are clearly not meeting the WHERE
requirements, drastically reducing I/O in some cases (a trick that has become popular in columns stores e.g. InfoBright on MySQL, but also in Oracle Exadata appliances [dubbed "smart scan" by Oracle marketing])
5. Hive works with "row store" formats (Text, SequenceFile, AVRO) and "column store" formats (ORC, Parquet) alike. The optimizer just uses specific strategies and shortcuts on the initial Map phase -- e.g. stripe elimination, vectorized operators -- and of course the serialization/deserialization phases are a bit more elaborate with column stores.