Hive documentation lacking again:
I\'d like to write the results of a query to a local file as well as the names of the columns.
Does Hive support this?
Here's my take on it. Note, i'm not very well versed in bash, so improvements suggestions welcome :)
#!/usr/bin/env bash
# works like this:
# ./get_data.sh database.table > data.csv
INPUT=$1
TABLE=${INPUT##*.}
DB=${INPUT%.*}
HEADER=`hive -e "
set hive.cli.print.header=true;
use $DB;
INSERT OVERWRITE LOCAL DIRECTORY '$TABLE'
row format delimited
fields terminated by ','
SELECT * FROM $TABLE;"`
HEADER_WITHOUT_TABLE_NAME=${HEADER//$TABLE./}
echo ${HEADER_WITHOUT_TABLE_NAME//[[:space:]]/,}
cat $TABLE/*
Not a great solution, but here is what I do:
create table test_dat
ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t" STORED AS
INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
LOCATION '/tmp/test_dat' as select * from YOUR_TABLE;
hive -e 'set hive.cli.print.header=true;select * from YOUR_TABLE limit 0' > /tmp/test_dat/header.txt
cat header.txt 000* > all.dat
I ran into this problem today and was able to get what I needed by doing a UNION ALL between the original query and a new dummy query that creates the header row. I added a sort column on each section and set the header to 0 and the data to a 1 so I could sort by that field and ensure the header row came out on top.
create table new_table as
select
field1,
field2,
field3
from
(
select
0 as sort_col, --header row gets lowest number
'field1_name' as field1,
'field2_name' as field2,
'field3_name' as field3
from
some_small_table --table needs at least 1 row
limit 1 --only need 1 header row
union all
select
1 as sort_col, --original query goes here
field1,
field2,
field3
from
main_table
) a
order by
sort_col --make sure header row is first
It's a little bulky, but at least you can get what you need with a single query.
Hope this helps!
Try
set hive.cli.print.header=true;
Indeed, @nija's answer is correct - at least as far as I know. There isn't any way to write the column names when doing an insert overwrite into [local] directory ...
(whether you use local or not).
With regards to the crashes described by @user1735861, there is a known bug in hive 0.7.1
(fixed in 0.8.0
) that, after doing set hive.cli.print.header=true;
, causes a NullPointerException
for any HQL command/query that produces no output. For example:
$ hive -S hive> use default; hive> set hive.cli.print.header=true; hive> use default; Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:222) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:287) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:517) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
Whereas this is fine:
$ hive -S hive> set hive.cli.print.header=true; hive> select * from dual; c c hive>
Non-HQL commands are fine though (set
,dfs
!
, etc...)
More info here: https://issues.apache.org/jira/browse/HIVE-2334
Yes you can. Put the set hive.cli.print.header=true;
in a .hiverc
file in your main directory or any of the other hive user properties files.
Vague Warning: be careful, since this has crashed queries of mine in the past (but I can't remember the reason).