问题
I'm using Sun Grid Engine on my ubuntu 14.04 to queue my jobs to be run on my multicore CPU. I've installed and set up SGE on my system but I have problem when testing it. I've created a "hello_world" dir which contains two shell scripts named "hello_world.sh" & "hello_world_qsub.sh" first including a simple command and second including qsub command to submit the first script file as a job to be run. Here's what "hello_world.sh" includes:
#!/bin/bash
echo "Hello world" > /home/theodore/tmp/hello_world/hello_world_output.txt
And here's what "hello_world_qsub.sh" includes:
#!/bin/bash
qsub \
-e /home/hello_world/hello_world_qsub.error \
-o /home/hello_world/hello_world_qsub.log \
./hello_world.sh
after giving permission to the second sh file and running it with "./hello_world_qsub.sh" command from the specified dir, the output is reasonable:
Your job 1 ("hello_world.sh") has been submitted
But the output of "qstat" command is frustrating:
job-ID prior name user state submit/start at queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
1 0.50000 hello_worl mhr qw 05/16/2016 20:26:23 1
And the "state" column always remain on "qw" and never changes to "r".
Here's the output of "qstat -j 1" command:
==============================================================
job_number: 1
exec_file: job_scripts/1
submission_time: Mon May 16 20:26:23 2016
owner: mhr
uid: 1000
group: mhr
gid: 1000
sge_o_home: /home/mhr
sge_o_log_name: mhr
sge_o_path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sge_o_shell: /bin/bash
sge_o_workdir: /home/mhr/hello_world
sge_o_host: localhost
account: sge
stderr_path_list: NONE:NONE:/home/hello_world/hello_world_qsub.error
mail_list: mhr@localhost
notify: FALSE
job_name: hello_world.sh
stdout_path_list: NONE:NONE:/home/hello_world/hello_world_qsub.log
jobshare: 0
env_list:
script_file: ./hello_world.sh
scheduling info: queue instance "mainqueue@localhost" dropped because it is temporarily not available
All queues dropped because of overload or full
And here's the output of "qhost" command:
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
localhost - - - - - - -
What should I do to make my jobs run and finish their task?
回答1:
From your qhost output, it looks like your machine "localhost" is properly configured in SGE. However, on "localhost" sge_execd is either not running or not configured properly. If it were, qhost would report statistics for "localhost".
回答2:
My problem solved. As @Finch_Powers stated the problem was about sge_execd. gridengine-exec was not installed properly. The problem was solved once I reinstalled it.
来源:https://stackoverflow.com/questions/37258350/sge-submitted-job-doesnt-run