Export as csv in beeline hive

做~自己de王妃 提交于 2019-11-27 08:55:00
ozw1z5rd

When hive version is at least 0.11.0 you can execute:

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/directoryWhereToStoreData' 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','  
LINES TERMINATED BY "\n"
SELECT * FROM yourTable;

from hive/beeline to store the table into a directory on the local filesystem.


Alternatively, with beeline, save your SELECT query in yourSQLFile.sql and run:

beeline -u 'jdbc:hive2://[databaseaddress]' --outputformat=csv2 -f yourSQlFile.sql > theFileWhereToStoreTheData.csv 

Also this will store the result into a file in the local file system.


From hive, to store the data somewhere into HDFS:

CREATE EXTERNAL TABLE output 
LIKE yourTable 
ROW FORMAT DELIMITED 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 'hfds://WhereDoYou/Like';

INSERT OVERWRITE TABLE output SELECT * from yourTable;

then you can collect the data to a local file using:

hdfs dfs -getmerge /WhereDoYou/Like

This is another option to get the data using beeline only:

env HADOOP_CLIENT_OPTS="-Ddisable.quoting.for.sv=false" beeline -u "jdbc:hive2://your.hive.server.address:10000/" --incremental=true --outputformat=csv2 -e "select * from youdatabase.yourtable" 

Working on:

Connected to: Apache Hive (version 1.1.0-cdh5.10.1)
Driver: Hive JDBC (version 1.1.0-cdh5.10.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.1.0-cdh5.10.1 by Apache Hive

You have different options.

1) You can control up to a point how the beeline output is made and then just save it to a file with linux. For example

beeline --outputformat=csv2 xxx > output.csv ( see the relevant parameters from the beeline help below )

2) For more control and better performance I wrote a little Java tool once. Its really only a couple lines of jdbc code.

3) and finally as Ana wrote. Yopu can just write a table into an external table in HDFS and specify the output format you want.

Like

create external table test ROW FORMAT delimited fields terminated by '|' location "/tmp/myfolder" as select * from mytable;

you can then get that output in the local file system with

hadoop fs -getmerge /tmp/myfolder myoutput.csv

You can use this command to save output in CSV format from beeline:

beeline -u 'jdbc:hive2://bigdataplatform-dev.nam.nsroot.net:10000/;principal=hive/bigdataplatform-dev.net@NAMUXDEV.NET;ssl=true' --outputformat=csv2 --verbose=false  --fastConnect=true   --silent=true -f $query_file>out.csv

Save your SQL query file into $query_file.

Result will be in out.csv.

I have complete eample here: hivehoney

Following worked for me

hive --silent=true --verbose=false --outputformat=csv2 -e "use <db_name>; select * from <table_name>" > table_name.csv


One advantage over using beeline is that you don't have have to provide hostname or user/pwd if you are running on hive node.

When some of the columns have string values having commas, tsv (tab separated) works better

hive --silent=true --verbose=false --outputformat=tsv -e "use <db_name>; select * from <table_name>" > table_name.tsv
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!