How to tell Condor to dispatch jobs only to machines on the cluster, that have “numpy” installed on them?

谁都会走 提交于 2019-12-05 00:42:34

Like any other machine attribute, you just need to advertise it in the machine classad, and then have your jobs require it.

To advertise it in the machine classad, you can either hard-code it into each machine's condor config file by adding something like this:

has_numpy = True
STARTD_EXPRS = $(STARTD_EXPRS) HAS_NUMPY

... or better yet, you can tell Condor to dynamically discover it at runtime with a script and advertise the result via a startd classad hook. To do that, install a simple has_numpy script on each machine like so:

#!/usr/bin/env python
try:
   import numpy
except ImportError:
   print "has_numpy = False"
else:
   print "has_numpy = True"

... and then tell Condor to run it every five minutes and stick the results in the startd classad, by adding the following to the machine's condor config file:

HASNUMPY = /usr/libexec/condor/has_numpy
STARTD_CRON_JOBLIST = $(STARTD_CRON_JOBLIST) HASNUMPY
STARTD_CRON_HASNUMPY_EXECUTABLE = $(HASNUMPY)
STARTD_CRON_HASNUMPY_PERIOD = 300

...and then ta-da (after a reconfig) your machines will dynamically detect and report whether numpy is installed and available to python scripts.

Then you just need to add a corresponding requirement to your job submit files, like so:

Requirements = (has_numpy == True)

...and your job will only run on machines where numpy is installed.

Do you need to? According to the condor manual:

Condor does not require an account (login) on machines where it runs a job. Condor can do this because of its remote system call technology, which traps library calls for such operations as reading or writing from disk files. The calls are transmitted over the network to be performed on the machine where the job was submitted.

To me this implies that if the machine submitting the job has numpy installed, it should work.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!