问题
There's a nice question (Find out the CPU time and memory usage of a slurm job) about how to retrieve the CPU time and memory usage of a slurm job and spinup has a nice answer (https://stackoverflow.com/a/56555505/4570472). However, if I understand correctly, seff <job id>
returns Memory Efficiency
which corresponds to MAXRSS over the entire life of the job.
How do I retrieve the time series of memory (and perhaps CPU) usage?
I'd like this to understand why my slurm jobs are running out of memory after 6+ hours of running fine.
来源:https://stackoverflow.com/questions/63250581/find-cpu-and-memory-time-series-of-slurm-job