load text to Orc file

江枫思渺然 提交于 2020-07-10 09:00:05

问题


How to load text file into Hive orc external table?

create table MyDB.TEST (
 Col1 String,
 Col2 String,
 Col3 String,
 Col4 String)
 STORED AS INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT
    'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

I have already created above table as Orc. but while fetching data from table it show below error Failed with exception

java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc/part-m-00000. Invalid postscript.


回答1:


There are multiple steps to that. Follows the details.

  1. Create a hive table which is able to read from the plain text file. Assuming that your file is a comma delimited file and your file is on HDFS on a location called /user/data/file1.txt, follows will be the syntax.

    create table MyDB.TEST (
      Col1 String,
      Col2 String,
      Col3 String,
      Col4 String
    )
    row format delimited
    fields terminated by ','
    location '/user/data/file1.txt';
    

Now you have a schema which is in sync with the format of the data you posses.

  1. Create another table with ORC schema

Now you need to create the ORC table as you were creating earlier. Here is a simpler syntax for creating that table.

create table MyDB.TEST_ORC (
  Col1 String,
  Col2 String,
  Col3 String,
  Col4 String)
STORED AS ORC;
  1. Your TEST_ORC table is an empty table now. You can populate this table using the data from TEST table using the following command.

    INSERT OVERWRITE TABLE TEST_ORC SELECT * FROM TEST;
    

The aforementioned statement will select all the records from TEST table and will try to write those records to TEST_ORC table. Since TEST_ORC is an ORC table, the data will be converted to ORC format on the fly when written into the table.

You can even check the storage location of TEST_ORC table for ORC files.

Now your data is in ORC format and your table TEST_ORC has the required schema to parse it. You may drop your TEST table now, if not needed.

Hope that helps!



来源:https://stackoverflow.com/questions/48928397/load-text-to-orc-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!