问题
Slurm sbatch
directs stdout and stderr to the files specified by the -o
and -e
flags, but fails to do so if the filepath contains directories that don't exist. Is there some way to automatically make the directories for my log files?
- Manually creating these directories each time is inefficient because I'm running each sbatch submission dozens of times.
- Letting the variation over job names exist in filenames rather than directories makes for a huge, poorly organized mess of logs I have to sort through when I need to check how my jobs did.
The only way I've found to do this is to wrap my calls to sbatch
inside bash scripts that are many times longer than seems necessary for such a small thing. I've included a shortened example below.
#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.
#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600
# Initialize variables used for script control flow
sub_or_main='sub'
# Parse options
while getopts ":A" opt; do
case $opt in
A)
sub_or_main='main'
;;
\?)
# Capture invalid options
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
shift $((OPTIND - 1))
# Either run the submit script or the main array
if [ $sub_or_main == 'sub' ]; then
# Submit script creates folders for log files, then calls sbatch on this
# script in main mode.
now=$(date +"%y%m%d-%H%M")
name=$(basename $1 .json)
logpath="log/my_script_name/$name/$now"
mkdir -p $logpath
sbatch \
-o $logpath/%a.out \
-e $logpath/%a.out \
$0 -A $1
else
# Main loop. Just calls my_script.py with the array ID.
python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID}
fi
Having a script like this works, but seems awfully wasteful: I've more than doubled the length of my sbatch submit script just to organize my log files. Moreover, most of that is added code that's going to be similar between batch submit scripts for other jobs, e.g. calling my_script2.py
etc, so it makes for a lot of code duplication. Can't help but think there has to be a better way.
回答1:
You can redirect the output of your Python script by yourself in your submission script, and either choose to discard the Slurm log, or write to the Slurm log interesting information about the job for provenance tracking and reproducibility purposes.
You could have a submission script go like this:
#!/bin/bash
# Set up and run job array for my_script.py, which takes as positional
# arguments a config file (passed via $1) and an array index.
#SBATCH --array=1-100
#SBATCH -n 1
#SBATCH -t 12:00:00
#SBATCH -p short
#SBATCH -J sim_sumstats
#SBATCH --mem=1600
now=$(date +"%y%m%d-%H%M")
name=$(basename $1 .json)
logpath="log/my_script_name/$name/$now"
mkdir -p $logpath
logfile="$logpath/${SLURM_ARRAY_TASK_ID}.out"
echo "Writing to ${logfile}"
scontrol show -dd job $SLURM_JOB_ID
printenv
python ./my_script.py $1 ${SLURM_ARRAY_TASK_ID} > ${logfile}
This way, the output from the Python script will be there where you want it, and the parent directory will be created before the log file is created.
Additionally, you will have the standard output file created by Slurm, with the default naming scheme, holding information about the job (from scontrol
) and from the environment (with printenv
).
But if you want to prevent Slurm from attempting to create the output file, set --output=/dev/null
.
来源:https://stackoverflow.com/questions/54370203/create-directory-for-log-file-before-calling-slurm-sbatch