Hive: writing column headers to local file?

前端 未结 7 1395
别跟我提以往
别跟我提以往 2020-12-08 01:17

Hive documentation lacking again:

I\'d like to write the results of a query to a local file as well as the names of the columns.

Does Hive support this?

相关标签:
7条回答
  • 2020-12-08 01:22

    Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)

    #!/usr/bin/env bash
    
    # works like this:
    # ./get_data.sh database.table > data.csv
    
    INPUT=$1
    TABLE=${INPUT##*.}
    DB=${INPUT%.*}
    
    HEADER=`hive -e "
      set hive.cli.print.header=true;
      use $DB;
      INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
      row format delimited
      fields terminated  by ','
      SELECT * FROM $TABLE;"`
    
    HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
    echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
    cat $TABLE/*
    
    0 讨论(0)
  • 2020-12-08 01:33

    Not a great solution, but here is what I do:

    create table test_dat
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS 
    INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" 
    OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat" 
    LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;
    
    hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt
    
    cat header.txt 000* > all.dat
    
    0 讨论(0)
  • 2020-12-08 01:40

    I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.

    create table new_table as
    select 
      field1,
      field2,
      field3
    from
    (
      select
        0 as sort_col,  --header row gets lowest number
        'field1_name' as field1,
        'field2_name' as field2,
        'field3_name' as field3
      from
        some_small_table  --table needs at least 1 row
      limit 1  --only need 1 header row
      union all
      select
        1 as sort_col,  --original query goes here
        field1,
        field2,
        field3
      from
        main_table
    ) a
    order by 
      sort_col  --make sure header row is first
    

    It's a little bulky, but at least you can get what you need with a single query.

    Hope this helps!

    0 讨论(0)
  • 2020-12-08 01:42

    Try

    set hive.cli.print.header=true;
    
    0 讨论(0)
  • 2020-12-08 01:43

    Indeed, @nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ... (whether you use local or not).

    With regards to the crashes described by @user1735861, there is a known bug in hive 0.7.1 (fixed in 0.8.0) that, after doing set hive.cli.print.header=true;, causes a NullPointerException for any HQL command/query that produces no output. For example:

    $ hive -S
    hive> use default; 
    hive> set hive.cli.print.header=true;
    hive> use default;
    Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
    

    Whereas this is fine:

    $ hive -S
    hive> set hive.cli.print.header=true;
    hive> select * from dual;
    c
    c
    hive> 
    

    Non-HQL commands are fine though (set,dfs !, etc...)

    More info here: https://issues.apache.org/jira/browse/HIVE-2334

    0 讨论(0)
  • 2020-12-08 01:44

    Yes you can. Put the set hive.cli.print.header=true; in a .hiverc file in your main directory or any of the other hive user properties files.

    Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).

    0 讨论(0)
提交回复
热议问题