we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:
insert overwrite directory \'/home/output.csv\'
This shell command prints the output format in csv to output.txt
without the column headers.
$ hive --outputformat=csv2 -f 'hivedatascript.hql' --hiveconf hive.cli.print.header=false > output.txt
Just to cover more following steps after kicking off the query:
INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp'
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
select books from table;
In my case, the generated data under temp folder is in deflate
format,
and it looks like this:
$ ls
000000_0.deflate
000001_0.deflate
000002_0.deflate
000003_0.deflate
000004_0.deflate
000005_0.deflate
000006_0.deflate
000007_0.deflate
Here's the command to unzip the deflate files and put everything into one csv file:
hadoop fs -text "file:///home/lvermeer/temp/*" > /home/lvermeer/result.csv
hive --outputformat=csv2 -e "select * from yourtable" > my_file.csv
or
hive --outputformat=csv2 -e "select * from yourtable" > [your_path]/file_name.csv
For tsv, just change csv to tsv in the above queries and run your queries
I tried various options, but this would be one of the simplest solution for Python
Pandas
:
hive -e 'select books from table' | grep "|" ' > temp.csv
df=pd.read_csv("temp.csv",sep='|')
You can also use tr "|" ","
to convert "|" to ","
If you want a CSV file then you can modify Lukas' solutions as follows (assuming you are on a linux box):
hive -e 'select books from table' | sed 's/[[:space:]]\+/,/g' > /home/lvermeer/temp.csv
If you are using HUE this is fairly simple as well. Simply go to the Hive editor in HUE, execute your hive query, then save the result file locally as XLS or CSV, or you can save the result file to HDFS.