问题
on my full local torque installation (torque-6.1.1), all my submitted jobs are stuck in 'Q' state, and I have to force their executions using qrun.
>qstat -f 141
Job Id: 141.localhost
Job_Name = script.pbs
Job_Owner = michael@localhost
job_state = Q
queue = batch
server = localhost
Checkpoint = u
ctime = Wed Aug 23 16:45:25 2017
Error_Path = localhost:/var/spool/torque/script.pbs.e141
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = bae
mtime = Wed Aug 23 16:45:25 2017
Output_Path = localhost:/var/spool/torque/script.pbs.o141
Priority = 0
qtime = Wed Aug 23 16:45:25 2017
Rerunable = True
Resource_List.walltime = 01:00:00
Resource_List.nodes = 1
Resource_List.nodect = 1
Variable_List = PBS_O_QUEUE=batch,PBS_O_HOME=/home/michael,
PBS_O_LOGNAME=michael,
PBS_O_PATH=/home/michael/bin:/home/michael/.local/bin:/usr/local/bin:
/usr/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbi
n:/bin:/usr/games:/usr/local/games:/snap/bin,PBS_O_SHELL=/bin/bash,
PBS_O_LANG=fr_FR.UTF-8,PBS_O_WORKDIR=/var/spool/torque,
PBS_O_HOST=localhost,PBS_O_SERVER=localhost
euser = michael
egroup = michael
queue_type = E
etime = Wed Aug 23 16:45:25 2017
submit_args = /home/michael/cnes-sowt/script.pbs
fault_tolerant = False
job_radix = 0
submit_host = localhost
init_work_dir = /var/spool/torque
request_version = 1
>sudo tracejob 141
/var/spool/torque/mom_logs/20170823: No matching job records located
/var/spool/torque/sched_logs/20170823: No matching job records located
Job: 141.localhost
08/23/2017 16:45:25.323 S enqueuing into batch, state 1 hop 1
08/23/2017 16:45:25 A queue=batch
Could it come from the fact i can qsub without being root, but i have to qrun with sudo?
Thanks a lot for your help..
回答1:
Solution is there https://cmayes.wordpress.com/2012/12/15/single-host-torque-pbs/ , bu adding a rule into /etc/hosts
来源:https://stackoverflow.com/questions/45843568/pbs-jobs-stay-queued-q-state-but-run-with-qrun