问题
How to load text file into Hive orc external table?
create table MyDB.TEST (
Col1 String,
Col2 String,
Col3 String,
Col4 String)
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
I have already created above table as Orc. but while fetching data from table it show below error Failed with exception
java.io.IOException:org.apache.orc.FileFormatException: Malformed ORC file hdfs://localhost:9000/Ext/sqooporc/part-m-00000. Invalid postscript.
回答1:
There are multiple steps to that. Follows the details.
Create a hive table which is able to read from the plain text file. Assuming that your file is a comma delimited file and your file is on HDFS on a location called /user/data/file1.txt, follows will be the syntax.
create table MyDB.TEST ( Col1 String, Col2 String, Col3 String, Col4 String ) row format delimited fields terminated by ',' location '/user/data/file1.txt';
Now you have a schema which is in sync with the format of the data you posses.
- Create another table with ORC schema
Now you need to create the ORC table as you were creating earlier. Here is a simpler syntax for creating that table.
create table MyDB.TEST_ORC (
Col1 String,
Col2 String,
Col3 String,
Col4 String)
STORED AS ORC;
Your TEST_ORC table is an empty table now. You can populate this table using the data from TEST table using the following command.
INSERT OVERWRITE TABLE TEST_ORC SELECT * FROM TEST;
The aforementioned statement will select all the records from TEST table and will try to write those records to TEST_ORC table. Since TEST_ORC is an ORC table, the data will be converted to ORC format on the fly when written into the table.
You can even check the storage location of TEST_ORC table for ORC files.
Now your data is in ORC format and your table TEST_ORC has the required schema to parse it. You may drop your TEST table now, if not needed.
Hope that helps!
来源:https://stackoverflow.com/questions/48928397/load-text-to-orc-file