How to wait in a bash script for several subprocesses spawned from that script to finish and return exit code !=0 when any of the subprocesses ends with code !=0 ?
S
To parallelize this...
for i in $(whatever_list) ; do
do_something $i
done
Translate it to this...
for i in $(whatever_list) ; do echo $i ; done | ## execute in parallel...
(
export -f do_something ## export functions (if needed)
export PATH ## export any variables that are required
xargs -I{} --max-procs 0 bash -c ' ## process in batches...
{
echo "processing {}" ## optional
do_something {}
}'
)
--max-procs
based on how much parallelism you want (0
means "all at once").xargs
-- but it isn't always installed by default.for
loop isn't strictly necessary in this example since echo $i
is basically just regenerating the output of $(whatever_list
). I just think the use of the for
keyword makes it a little easier to see what is going on.Here's a simplified working example...
for i in {0..5} ; do echo $i ; done |xargs -I{} --max-procs 2 bash -c '
{
echo sleep {}
sleep 2s
}'
Here's what I've come up with so far. I would like to see how to interrupt the sleep command if a child terminates, so that one would not have to tune WAITALL_DELAY
to one's usage.
waitall() { # PID...
## Wait for children to exit and indicate whether all exited with 0 status.
local errors=0
while :; do
debug "Processes remaining: $*"
for pid in "$@"; do
shift
if kill -0 "$pid" 2>/dev/null; then
debug "$pid is still alive."
set -- "$@" "$pid"
elif wait "$pid"; then
debug "$pid exited with zero exit status."
else
debug "$pid exited with non-zero exit status."
((++errors))
fi
done
(("$#" > 0)) || break
# TODO: how to interrupt this sleep when a child terminates?
sleep ${WAITALL_DELAY:-1}
done
((errors == 0))
}
debug() { echo "DEBUG: $*" >&2; }
pids=""
for t in 3 5 4; do
sleep "$t" &
pids="$pids $!"
done
waitall $pids
set -e
fail () {
touch .failure
}
expect () {
wait
if [ -f .failure ]; then
rm -f .failure
exit 1
fi
}
sleep 2 || fail &
sleep 2 && false || fail &
sleep 2 || fail
expect
The set -e
at top makes your script stop on failure.
expect
will return 1
if any subjob failed.
#!/bin/bash
set -m
for i in `seq 0 9`; do
doCalculations $i &
done
while fg; do true; done
set -m
allows you to use fg & bg in a scriptfg
, in addition to putting the last process in the foreground, has the same exit status as the process it foregroundswhile fg
will stop looping when any fg
exits with a non-zero exit statusunfortunately this won't handle the case when a process in the background exits with a non-zero exit status. (the loop won't terminate immediately. it will wait for the previous processes to complete.)
There can be a case where the process is complete before waiting for the process. If we trigger wait for a process that is already finished, it will trigger an error like pid is not a child of this shell. To avoid such cases, the following function can be used to find whether the process is complete or not:
isProcessComplete(){
PID=$1
while [ -e /proc/$PID ]
do
echo "Process: $PID is still running"
sleep 5
done
echo "Process $PID has finished"
}
I used this recently (thanks to Alnitak):
#!/bin/bash
# activate child monitoring
set -o monitor
# locking subprocess
(while true; do sleep 0.001; done) &
pid=$!
# count, and kill when all done
c=0
function kill_on_count() {
# you could kill on whatever criterion you wish for
# I just counted to simulate bash's wait with no args
[ $c -eq 9 ] && kill $pid
c=$((c+1))
echo -n '.' # async feedback (but you don't know which one)
}
trap "kill_on_count" CHLD
function save_status() {
local i=$1;
local rc=$2;
# do whatever, and here you know which one stopped
# but remember, you're called from a subshell
# so vars have their values at fork time
}
# care must be taken not to spawn more than one child per loop
# e.g don't use `seq 0 9` here!
for i in {0..9}; do
(doCalculations $i; save_status $i $?) &
done
# wait for locking subprocess to be killed
wait $pid
echo
From there one can easily extrapolate, and have a trigger (touch a file, send a signal) and change the counting criteria (count files touched, or whatever) to respond to that trigger. Or if you just want 'any' non zero rc, just kill the lock from save_status.