OS: Cent-OS
I have some 30,000
jobs(or Scripts) to run. Each job takes 3-5 Min. I have 48 CPUs(nproc = 48)
. I can use 40 CPUs to run
Just execute those scripts, Linux will internally distribute those tasks properly amongst available CPUs. This is upon the Linux task scheduler. But, if you want you can also execute a task on a particular CPU by using taskset
(see man taskset
). You can do it from a script to execute your 30K tasks. Remember in this manual way, be sure about what you are doing.
I have used REDIS
to do this sort of thing - it is very simple to install and the CLI is easy to use.
I mainly used LPUSH
to push all the jobs onto a "queue" in REDIS and BLPOP
to do a blocking remove of a job from the queue. So you would LPUSH
30,000 jobs (or script names or parameters) at the start, then start 40 processes in the background (1 per CPU) and each process would sit in a loop doing BLPOP
to get a job, run it and do the next.
You can add layers of sophistication to log completed jobs in another "queue".
Here is a little demonstration of what to do...
First, start a Redis server on any machine in your network:
./redis-server & # start REDIS server in background
Or, you could put this in your system startup if you use it always.
Now push 3 jobs onto queue called jobs:
./redis-cli # start REDIS command line interface
redis 127.0.0.1:6379> lpush jobs "job1"
(integer) 1
redis 127.0.0.1:6379> lpush jobs "job2"
(integer) 2
redis 127.0.0.1:6379> lpush jobs "job3"
(integer) 3
See how many jobs there are in queue:
redis 127.0.0.1:6379> llen jobs
(integer) 3
Wait with infinite timeout for job
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job1"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job2"
redis 127.0.0.1:6379> brpop jobs 0
1) "jobs"
2) "job3"
This last one will wait a LONG time as there are no jobs in queue:
redis 127.0.0.1:6379> brpop jobs 0
Of course, this is readily scriptable:
Put 30,000 jobs in queue:
for ((i=0;i<30000;i++)) ; do
echo "lpush jobs job$i" | redis-cli
done
If your Redis server is on a remote host, just use:
redis-cli -h <HOSTNAME>
Here's how to check progress:
echo "llen jobs" | redis-cli
(integer) 30000
And you could start 40 jobs like this:
#!/bin/bash
for ((i=0;i<40;i++)) ; do
./Keep1ProcessorBusy $i &
done
And then Keep1ProcessorBusy
would be something like this:
#!/bin/bash
# Endless loop picking up jobs and processing them
while :
do
job=$(echo brpop jobs 0 | redis_cli)
# Set processor affinity here too if you want to force it, use $1 parameter we were called with
do $job
done
Of course, the actual script or job you want run could also be stored in Redis.
As a totally different option, you could look at GNU Parallel
, which is here. And also remember that you can run the output of find
through xargs
with the -P
option to parallelise stuff.
As Mark Setchell says: GNU Parallel.
find scripts/ -type f | parallel
If you insists on keeping 8 CPUs free:
find scripts/ -type f | parallel -j-8
But usually it is more efficient simply to use nice
as that will give you all 48 cores when no one else needs them:
find scripts/ -type f | nice -n 15 parallel
To learn more: