I would like to run multiple Hive queries, preferably in parallel rather than sequentially, and store the output of each query into a csv file. For example, query1
You can run and monitor parallel jobs in a shell script:
#!/bin/bash
#Run parallel processes and wait for their completion
#Add loop here or add more calls
hive -e "SELECT * FROM db.table1;" | tr "\t" "," > example1.csv &
hive -e "SELECT * FROM db.table2;" | tr "\t" "," > example2.csv &
hive -e "SELECT * FROM db.table3;" | tr "\t" "," > example3.csv &
#Note the ampersand in above commands says to create parallel process
#You can wrap hive call in a function an do some logging in it, etc
#And call a function as parallel process in the same way
#Modify this script to fit your needs
#Now wait for all processes to complete
#Failed processes count
FAILED=0
for job in `jobs -p`
do
echo "job=$job"
wait $job || let "FAILED+=1"
done
#Final status check
if [ "$FAILED" != "0" ]; then
echo "Execution FAILED! ($FAILED)"
#Do something here, log or send messege, etc
exit 1
fi
#Normal exit
#Do something else here
exit 0
There are other ways (using XARGS, GNU parallel) to run parallel processes in shell and a lot of resources on it. Read also https://www.slashroot.in/how-run-multiple-commands-parallel-linux and https://thoughtsimproved.wordpress.com/2015/05/18/parellel-processing-in-bash/