GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

后端 未结 2 1444
一向
一向 2020-12-31 18:24

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My questi

相关标签:
2条回答
  • 2020-12-31 18:46

    This is not an answer to the 3 primary questions, but I'd like to point out some other problems with the parallel statement in the first code block.

    parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
      matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
    

    The shell expands the $PBS_O_WORKDIR prior to executing parallel. This means two things happen (1) the --env sees a filename rather than an environment variable name and essentially does nothing and (2) expands as part command string eliminating the need to pass $PBS_O_WORKDIR which is why there wasn't an error.

    The latest version of parallel 20151022 has a workdir option (although the tutorial lists it as alpha testing) which is probably the easiest solution. The parallel command line would look something like:

    parallel --workdir $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
      matlab -nodisplay -r "primes1({})" :::: 10 20 30 40
    

    Final note, PBS_NODEFILE may contain hosts listed multiple times if more than one processor is requested by qsub. This many have implications for number of jobs run, etc.

    0 讨论(0)
  • 2020-12-31 19:01
    1. Yes: -j is the number of jobs per node.
    2. Yes: Install 'parallel' in your $PATH on the remote hosts.
    3. Yes: It is a consequence from parallel missing from the $PATH.

    GNU Parallel logs into the remote machine; tries to determine the number of cores (using parallel --number-of-cores) which fails and then defaults to 1 CPU core per host. By giving -j2 GNU Parallel will not try to determine the number of cores.

    Did you know that you can also give the number of cores in the --sshlogin as: 4/myserver ? This is useful if you have a mix of machines with different number of cores.

    0 讨论(0)
提交回复
热议问题