I am running multiple instances of a worker as described in this answer: Starting multiple upstart instances automatically
Question: Can I restart all instan
I tried it with the example from above and SpamapS answer, I received:
init: my-workers pre-start process (22955) terminated with status 127
In /var/log/upstart/my-workers.log
I found the problem:
/proc/self/fd/9: 6: /proc/self/fd/9: end: not found
The end
of the for-loop in my-workers.conf
seemed to be wrong syntax.
I replaced
script
for i in `seq 1 $NUM_WORKERS`
do
start worker N=$i
done
end
end script
with
script
for i in `seq 1 $NUM_WORKERS`
do
start worker N=$i
done
end script
and it worked!
Consider adding to the worker.conf one more event:
stop on shutdown or workers-stop
Then you can call from the command line
sudo initctl emit workers-stop
You can add similar event to start workers. To achieve restarting all workers create a task that will emit workers-stop and then workers-start events.
Essentially you need to have a process that executes many stop
and start
commands for all your N=1
, N=2
combination.
A simple way to do this is a couple of bash for
loops inside an exec script
stanza. However, if the processes take some time to stop (e.g. because they are working on something and they are accepting SIGTERM
after having processed their current job) this is inefficient as you have to wait for one to stop before sending the signal to the next one.
Therefore, I built an Upstart script that stops them in parallel at https://github.com/elifesciences/builder-base-formula/blob/master/elife/config/etc-init-multiple-processes-parallel.conf
The script is compiled by Salt using as input a map of process names to how many are there. Here is a sample result:
description "(Re)starts all instances, in parallel"
# http://upstart.ubuntu.com/cookbook/#start-on
start on (local-filesystems and net-device-up IFACE!=lo)
task
script
timeout=300
echo "--------"
echo "Current status of 5 elife-bot-worker processes"
echo "Now is" $(date -Iseconds)
for i in `seq 1 5`
do
status elife-bot-worker ID=$i || true
done
echo "Stopping asynchronously 5 elife-bot-worker processes"
echo "Now is" $(date -Iseconds)
for i in `seq 1 5`
do
(stop elife-bot-worker ID=$i &) || true
done
for i in `seq 1 5`
do
echo "Waiting for elife-bot-worker $i to stop"
echo "Now is" $(date -Iseconds)
counter=0
while true
do
if [ "$counter" -gt "$timeout" ]
then
echo "It shouldn't take more than $timeout seconds to kill all the elife-bot-worker processes"
exit 1
fi
status elife-bot-worker ID=$i 2>&1 | grep "Unknown instance" && break
sleep 1
counter=$((counter + 1))
done
done
echo "Stopped all elife-bot-worker processes"
echo "Starting 5 elife-bot-worker processes"
for i in `seq 1 5`
do
start elife-bot-worker ID=$i
done
echo "Started 5 elife-bot-worker processes"
end script
In worker.conf
you just need to change this line:
stop on shutdown
To:
stop on stopping my-workers
And change my-workers.conf
to use pre-start
instead of script
:
pre-start script
for i in `seq 1 $NUM_WORKERS`
do
start worker N=$i
done
end script
Now my-workers
will keep state: since the work happens in pre-start
, the my-workers
main process won't exist and so won't exit. stop on stopping my-workers
causes the workers to stop whenever my-workers
is stopped. Then of course when it starts up again it will start the workers again.
(FYI, stop on shutdown
does nothing, as shutdown
is not a system event. man upstart-events
for all the defined events) so you should also change my-workers to stop on runlevel [06]