Issue when executing a show create table
and then executing the resulting create table
statement if the table is ORC.
Using show create table
, you get this:
STORED AS INPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’
But if you create the table with those clauses, you will then get the casting error when selecting. Error likes:
Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable
To fix this, just change create table
statement to STORED AS ORC
But, as the answer said in the similar question:
What is the difference between 'InputFormat, OutputFormat' & 'Stored as' in Hive? .
I can't figure out the reason.
STORED AS
implies 3 things:
- SERDE
- INPUTFORMAT
- OUTPUTFORMAT
You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde
hive.default.serde
Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Added in: Hive 0.14 with HIVE-5976
The default SerDe Hive will use for storage formats that do not specify a SerDe.
Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.
Demo
hive.default.serde
set hive.default.serde;
hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
STORED AS ORC
create table mytable (i int)
stored as orc;
show create table mytable;
Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
CREATE TABLE `mytable`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982059')
STORED AS INPUTFORMAT ... OUTPUTFORMAT ...
create table mytable2 (i int)
STORED AS
INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
;
show create table mytable2
;
Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
CREATE TABLE `mytable2`(
`i` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'file:/home/cloudera/local_db/mytable2'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',
'numFiles'='0',
'numRows'='0',
'rawDataSize'='0',
'totalSize'='0',
'transient_lastDdlTime'='1496982426')
You сan specify INPUTFORMAT
, OUTPUTFORMAT
, SERDE
in STORED AS
when creating table. Hive allows you to separate your record format from your file format. You can provide custom classes for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. See details: http://www.dummies.com/programming/big-data/hadoop/defining-table-record-formats-in-hive/
Alternatively you can write simply STORED AS ORC
or STORED AS TEXTFILE
for example.
STORED AS ORC statement already takes care about INPUTFORMAT
, OUTPUTFORMAT
and SERDE
. This allows you not to write those long fully qualified Java class names for INPUTFORMAT
, OUTPUTFORMAT
, SERDE
. Just STORED AS ORC
instead.
来源:https://stackoverflow.com/questions/44443697/difference-between-stored-as-inputformat-outputformat-and-stored-as-in-hive