Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

前端 未结 2 2333
情书的邮戳
情书的邮戳 2021-02-20 10:31

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC.

Using sho

相关标签:
2条回答
  • 2021-02-20 10:58

    You сan specify INPUTFORMAT, OUTPUTFORMAT, SERDE in STORED AS when creating table. Hive allows you to separate your record format from your file format. You can provide custom classes for INPUTFORMAT, OUTPUTFORMAT, SERDE. See details: http://www.dummies.com/programming/big-data/hadoop/defining-table-record-formats-in-hive/

    Alternatively you can write simply STORED AS ORC or STORED AS TEXTFILE for example. STORED AS ORC statement already takes care about INPUTFORMAT, OUTPUTFORMAT and SERDE. This allows you not to write those long fully qualified Java class names for INPUTFORMAT, OUTPUTFORMAT, SERDE. Just STORED AS ORC instead.

    0 讨论(0)
  • 2021-02-20 11:03

    STORED AS implies 3 things:

    1. SERDE
    2. INPUTFORMAT
    3. OUTPUTFORMAT

    You have defined only the last 2, leaving the SERDE to be defined by hive.default.serde

    hive.default.serde
    Default Value: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    Added in: Hive 0.14 with HIVE-5976
    The default SerDe Hive will use for storage formats that do not specify a SerDe.
    Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.

    Demo

    hive.default.serde

    set hive.default.serde;
    

    hive.default.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
    

    STORED AS ORC

    create table mytable (i int) 
    stored as orc;
    
    show create table mytable;
    

    Note that the SERDE is 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'

    CREATE TABLE `mytable`(
      `i` int)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'file:/home/cloudera/local_db/mytable'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
      'numFiles'='0', 
      'numRows'='0', 
      'rawDataSize'='0', 
      'totalSize'='0', 
      'transient_lastDdlTime'='1496982059')
    

    STORED AS INPUTFORMAT ... OUTPUTFORMAT ...

    create table mytable2 (i int) 
    STORED AS 
    INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    ;
    
    show create table mytable2
    ;
    

    Note that the SERDE is 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

    CREATE TABLE `mytable2`(
      `i` int)
    ROW FORMAT SERDE 
      'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
    STORED AS INPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
    OUTPUTFORMAT 
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'file:/home/cloudera/local_db/mytable2'
    TBLPROPERTIES (
      'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', 
      'numFiles'='0', 
      'numRows'='0', 
      'rawDataSize'='0', 
      'totalSize'='0', 
      'transient_lastDdlTime'='1496982426')
    
    0 讨论(0)
提交回复
热议问题