How do I output the results of a HiveQL query to CSV?

前端 未结 18 1603
独厮守ぢ
独厮守ぢ 2020-11-27 10:11

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory \'/home/output.csv\'          


        
相关标签:
18条回答
  • 2020-11-27 10:44

    I was looking for a similar solution, but the ones mentioned here would not work. My data had all variations of whitespace (space, newline, tab) chars and commas.

    To make the column data tsv safe, I replaced all \t chars in the column data with a space, and executed python code on the commandline to generate a csv file, as shown below:

    hive -e 'tab_replaced_hql_query' |  python -c 'exec("import sys;import csv;reader = csv.reader(sys.stdin, dialect=csv.excel_tab);writer = csv.writer(sys.stdout, dialect=csv.excel)\nfor row in reader: writer.writerow(row)")'
    

    This created a perfectly valid csv. Hope this helps those who come looking for this solution.

    0 讨论(0)
  • 2020-11-27 10:48

    The default separator is "^A". In python language, it is "\x01".

    When I want to change the delimiter, I use SQL like:

    SELECT col1, delimiter, col2, delimiter, col3, ..., FROM table
    

    Then, regard delimiter+"^A" as a new delimiter.

    0 讨论(0)
  • 2020-11-27 10:48

    I may be late to this one, but would help with the answer:

    echo "COL_NAME1|COL_NAME2|COL_NAME3|COL_NAME4" > SAMPLE_Data.csv hive -e ' select distinct concat(COL_1, "|", COL_2, "|", COL_3, "|", COL_4) from table_Name where clause if required;' >> SAMPLE_Data.csv

    0 讨论(0)
  • 2020-11-27 10:49

    You can use hive string function CONCAT_WS( string delimiter, string str1, string str2...strn )

    for ex:

    hive -e 'select CONCAT_WS(',',cola,colb,colc...,coln) from Mytable' > /home/user/Mycsv.csv
    
    0 讨论(0)
  • 2020-11-27 10:49

    I had a similar issue and this is how I was able to address it.

    Step 1 - Loaded the data from Hive table into another table as follows

    DROP TABLE IF EXISTS TestHiveTableCSV;
    CREATE TABLE TestHiveTableCSV 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n' AS
    SELECT Column List FROM TestHiveTable;
    

    Step 2 - Copied the blob from Hive warehouse to the new location with appropriate extension

    Start-AzureStorageBlobCopy
    -DestContext $destContext 
    -SrcContainer "Source Container"
    -SrcBlob "hive/warehouse/TestHiveTableCSV/000000_0"
    -DestContainer "Destination Container"
    -DestBlob "CSV/TestHiveTable.csv"
    
    0 讨论(0)
  • 2020-11-27 10:50

    Although it is possible to use INSERT OVERWRITE to get data out of Hive, it might not be the best method for your particular case. First let me explain what INSERT OVERWRITE does, then I'll describe the method I use to get tsv files from Hive tables.

    According to the manual, your query will store the data in a directory in HDFS. The format will not be csv.

    Data written to the filesystem is serialized as text with columns separated by ^A and rows separated by newlines. If any of the columns are not of primitive type, then those columns are serialized to JSON format.

    A slight modification (adding the LOCAL keyword) will store the data in a local directory.

    INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' select books from table;
    

    When I run a similar query, here's what the output looks like.

    [lvermeer@hadoop temp]$ ll
    total 4
    -rwxr-xr-x 1 lvermeer users 811 Aug  9 09:21 000000_0
    [lvermeer@hadoop temp]$ head 000000_0 
    "row1""col1"1234"col3"1234FALSE
    "row2""col1"5678"col3"5678TRUE
    

    Personally, I usually run my query directly through Hive on the command line for this kind of thing, and pipe it into the local file like so:

    hive -e 'select books from table' > /home/lvermeer/temp.tsv
    

    That gives me a tab-separated file that I can use. Hope that is useful for you as well.

    Based on this patch-3682, I suspect a better solution is available when using Hive 0.11, but I am unable to test this myself. The new syntax should allow the following.

    INSERT OVERWRITE LOCAL DIRECTORY '/home/lvermeer/temp' 
    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ',' 
    select books from table;
    

    Hope that helps.

    0 讨论(0)
提交回复
热议问题