How do I output the results of a HiveQL query to CSV?

前端 未结 18 1598
独厮守ぢ
独厮守ぢ 2020-11-27 10:11

we would like to put the results of a Hive query to a CSV file. I thought the command should look like this:

insert overwrite directory \'/home/output.csv\'          


        
相关标签:
18条回答
  • 2020-11-27 11:02

    You should use CREATE TABLE AS SELECT (CTAS) statement to create a directory in HDFS with the files containing the results of the query. After that you will have to export those files from HDFS to your regular disk and merge them into a single file.

    You also might have to do some trickery to convert the files from '\001' - delimited to CSV. You could use a custom CSV SerDe or postprocess the extracted file.

    0 讨论(0)
  • 2020-11-27 11:02

    Use the command:

    hive -e "use [database_name]; select * from [table_name] LIMIT 10;" > /path/to/file/my_file_name.csv

    I had a huge dataset whose details I was trying to organize and determine the types of attacks and the numbers of each type. An example that I used on my practice that worked (and had a little more details) goes something like this:

    hive -e "use DataAnalysis;
    select attack_cat, 
    case when attack_cat == 'Backdoor' then 'Backdoors' 
    when length(attack_cat) == 0 then 'Normal' 
    when attack_cat == 'Backdoors' then 'Backdoors' 
    when attack_cat == 'Fuzzers' then 'Fuzzers' 
    when attack_cat == 'Generic' then 'Generic' 
    when attack_cat == 'Reconnaissance' then 'Reconnaissance' 
    when attack_cat == 'Shellcode' then 'Shellcode' 
    when attack_cat == 'Worms' then 'Worms' 
    when attack_cat == 'Analysis' then 'Analysis' 
    when attack_cat == 'DoS' then 'DoS' 
    when attack_cat == 'Exploits' then 'Exploits' 
    when trim(attack_cat) == 'Fuzzers' then 'Fuzzers' 
    when trim(attack_cat) == 'Shellcode' then 'Shellcode' 
    when trim(attack_cat) == 'Reconnaissance' then 'Reconnaissance' end,
    count(*) from actualattacks group by attack_cat;">/root/data/output/results2.csv
    
    0 讨论(0)
  • 2020-11-27 11:03

    In case you are doing it from Windows you can use Python script hivehoney to extract table data to local CSV file.

    It will:

    1. Login to bastion host.
    2. pbrun.
    3. kinit.
    4. beeline (with your query).
    5. Save echo from beeline to a file on Windows.

    Execute it like this:

    set PROXY_HOST=your_bastion_host
    
    set SERVICE_USER=you_func_user
    
    set LINUX_USER=your_SOID
    
    set LINUX_PWD=your_pwd
    
    python hh.py --query_file=query.sql
    
    0 讨论(0)
  • 2020-11-27 11:05

    This is most csv friendly way I found to output the results of HiveQL.
    You don't need any grep or sed commands to format the data, instead hive supports it, just need to add extra tag of outputformat.

    hive --outputformat=csv2 -e 'select * from <table_name> limit 20' > /path/toStore/data/results.csv
    
    0 讨论(0)
  • 2020-11-27 11:06

    Similar to Ray's answer above, Hive View 2.0 in Hortonworks Data Platform also allows you to run a Hive query and then save the output as csv.

    0 讨论(0)
  • 2020-11-27 11:08

    You can use INSERTDIRECTORY …, as in this example:

    INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
    SELECT name, salary, address
    FROM employees
    WHERE se.state = 'CA';
    

    OVERWRITE and LOCAL have the same interpretations as before and paths are interpreted following the usual rules. One or more files will be written to /tmp/ca_employees, depending on the number of reducers invoked.

    0 讨论(0)
提交回复
热议问题