How do I output the results of a HiveQL query to CSV using a shell script?

前端 未结 2 2000
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-15 17:26

I would like to run multiple Hive queries, preferably in parallel rather than sequentially, and store the output of each query into a csv file. For example, query1

2条回答
  •  一向
    一向 (楼主)
    2021-01-15 17:35

    You can run and monitor parallel jobs in a shell script:

    #!/bin/bash
    
    #Run parallel processes and wait for their completion
    
    #Add loop here or add more calls
    hive -e "SELECT * FROM db.table1;" | tr "\t" "," > example1.csv &
    hive -e "SELECT * FROM db.table2;" | tr "\t" "," > example2.csv &
    hive -e "SELECT * FROM db.table3;" | tr "\t" "," > example3.csv &
    
    #Note the ampersand in above commands says to create parallel process
    #You can wrap hive call in a function an do some logging in it, etc
    #And call a function as parallel process in the same way
    #Modify this script to fit your needs
    
    #Now wait for all processes to complete
    
    #Failed processes count
    FAILED=0
    
    for job in `jobs -p`
    do
       echo "job=$job"
       wait $job || let "FAILED+=1"
    done   
    
    #Final status check
    if [ "$FAILED" != "0" ]; then
        echo "Execution FAILED!  ($FAILED)"
        #Do something here, log or send messege, etc
        exit 1
    fi
    
    #Normal exit
    #Do something else here
    exit 0
    

    There are other ways (using XARGS, GNU parallel) to run parallel processes in shell and a lot of resources on it. Read also https://www.slashroot.in/how-run-multiple-commands-parallel-linux and https://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/

提交回复
热议问题