I just figured out how to send jobs to be processed on machines on the cluster by using Condor. Since we have a lot of machines and not each of those machines are configured the same, I was wondering:
Is it possible to tell condor only to dispatch my jobs (python scripts) to machines, that have numpy installed on them since my script depends on this package?
Like any other machine attribute, you just need to advertise it in the machine classad, and then have your jobs require it.
To advertise it in the machine classad, you can either hard-code it into each machine's condor config file by adding something like this:
has_numpy = True
STARTD_EXPRS = $(STARTD_EXPRS) HAS_NUMPY
... or better yet, you can tell Condor to dynamically discover it at runtime with a script and advertise the result via a startd classad hook. To do that, install a simple has_numpy
script on each machine like so:
#!/usr/bin/env python
try:
import numpy
except ImportError:
print "has_numpy = False"
else:
print "has_numpy = True"
... and then tell Condor to run it every five minutes and stick the results in the startd classad, by adding the following to the machine's condor config file:
HASNUMPY = /usr/libexec/condor/has_numpy
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) HASNUMPY
STARTD_CRON_HASNUMPY_EXECUTABLE = $(HASNUMPY)
STARTD_CRON_HASNUMPY_PERIOD = 300
...and then ta-da (after a reconfig) your machines will dynamically detect and report whether numpy is installed and available to python scripts.
Then you just need to add a corresponding requirement to your job submit files, like so:
Requirements = (has_numpy == True)
...and your job will only run on machines where numpy is installed.
Do you need to? According to the condor manual:
Condor does not require an account (login) on machines where it runs a job. Condor can do this because of its remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are transmitted over the network to be performed on the machine where the job was submitted.
To me this implies that if the machine submitting the job has numpy installed, it should work.
来源:https://stackoverflow.com/questions/9864766/how-to-tell-condor-to-dispatch-jobs-only-to-machines-on-the-cluster-that-have