I am trying to run a job, but condor can't seem to find my file.
I've made sure that:
- the file is there by doing an ls and cat on its absolute path
- run it from a condor interactive session
- give it the right permissions so that it runs it.
I've done that but I get this error:
(automl-meta-learning) miranda9~/automl-meta-learning/automl-proj/experiments/meta_learning $ cat condor_job_log_69.out
000 (069.000.000) 10/21 11:06:06 Job submitted from host: <[--1]-9618&noUDP&sock=3715279_f2e6_4>
001 (069.000.000) 10/21 11:06:07 Job executing on host: <[--1]-9618&noUDP&sock=807_1d04_3>
007 (069.000.000) 10/21 11:06:07 Shadow exception!
Error from slot1_3@vision-01.cs.illinois.edu: Failed to execute '/home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py': (errno=2: 'No such file or directory')
0 - Run Bytes Sent By Job
0 - Run Bytes Received By Job
012 (069.000.000) 10/21 11:06:07 Job was held.
Error from slot1_3@vision-01.cs.illinois.edu: Failed to execute '/home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py': (errno=2: 'No such file or directory')
Code 6 Subcode 2
but the file is clearly there:
(automl-meta-learning) miranda9~/automl-meta-learning/automl-proj/experiments/meta_learning $ ls -lah /home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
-rwxrwxr-x. 1 miranda9 miranda9 22K Oct 20 14:54 /home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
I don't understand why condor can't find it. Any ideas? I'm not the sys admin so I don't even know how to start debugging this.
btw my submission script:
# Experiments script
# Simple HTCondor submit description file
# reference: https://gitlab.engr.illinois.edu/Vision/vision-gpu-servers/-/wikis/HTCondor-user-guide#submit-jobs
# chmod a+x test_condor.py
# chmod a+x experiments_meta_model_optimization.py
# chmod a+x meta_learning_experiments_submission.py
# chmod a+x download_miniImagenet.py
# condor_submit -i
# condor_submit job.sub
# Executable = meta_learning_experiments_submission.py
# Executable = automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
# Executable = ~/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
Executable = /home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
## Output Files
Log = condor_job.$(CLUSTER).log.out
Output = condor_job.$(CLUSTER).stdout.out
Error = condor_job.$(CLUSTER).err.out
# Use this to make sure 1 gpu is available. The key words are case insensitive.
REquest_gpus = 1
# requirements = ((CUDADeviceName = "Tesla K40m")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.gpus >= Requestgpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
# requirements = (CUDADeviceName == "Tesla K40m")
# requirements = (CUDADeviceName == "Quadro RTX 6000")
requirements = (CUDADeviceName != "Tesla K40m")
# Note: to use multiple CPUs instead of the default (one CPU), use request_cpus as well
Request_cpus = 8
# E-mail option
Notify_user = me@gmail.com
Notification = always
Environment = MY_CONDOR_JOB_ID= $(CLUSTER)
# "Queue" means add the setup until this line to the queue (needs to be at the end of script).
It looks like your executable is a python script. Linux will report "no such file or directory" when the script itself exists, but the interpreter listed on the "#!" doesn't exist on the system. Could this be what is happening here? What is the first line of this script look like?
The issue was that at the top of my python submission script I had arguments for other clusters not related to condor so the wrong path to the python executable was there. I fixed it by removing that and adding this line to my python submission script:
in fact to find the python path for your current env do:
which python