问题
I trying start task on cluster via Torque PBS with command
qsub -o a.txt a.sh
File a.sh contain single string:
hostname
After command qsub I make qstat command, that give next output:
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
302937.voms a.sh user 00:00:00 E long
After 5 seconds command qstat return empty output (no jobs in queue). Command
qsub --version
give output: version: 2.5.13
Command
which qsub
Output: /usr/bin/qsub
The problem is that the file a.txt (from command qsub -o a.txt a.sh) is not created! In the terminal returned only job id, there is not any errors. Command
qsub a.sh
has the same behavior. How I can fix it? Where is qsub log files with errors?
If I use command
qsub -l nodes=node36:ppn=1 -o a.txt a.sh
then output files I can find in folder
/var/spool/pbs/undelivered
on node36 (after ssh login on it). Output file contain string "node36", error file is empty. Why my files is "undelivered"?
回答1:
The output log and error log files are kept on the execution node in a spool directory and copied back to the head node after the job has completed. The location of the spool directory may vary. But you should look for it
under
/var/torque/spool
on the first node from the list of nodes the job has been allocated.
There are multiple reasons that might cause torque to fail to deliver the output files.
- The user submitting the job might not exist on the node or their home directory might not be accessible, or there is a user ID mismatch between the nodes of the cluster.
- Torque is using ssh to copy files to the head node, but passwordless public key authentication for the user to ssh across the cluster has not been set up consistently on all the nodes.
- A node failed during the execution of the job.
This list is by no means complete. Already here on Stack Overflow one can find a number of questions dealing with such a failure. Try to check if any of the above applies to your case.
回答2:
You(or anyone else finding this thread) should also check out the solution given here: PBS, refresh stdout
If you have admin access, you can set
$spool_as_final_name true
which causes the output to be written directly to the final destination.
来源:https://stackoverflow.com/questions/46759079/why-torque-qsub-dont-create-output-file