mpirun: Unrecognized argument mca

房东的猫 提交于 2020-01-16 00:55:41

问题


I have a c++ solver which I need to run in parallel using the following command:

nohup mpirun -np 16 ./my_exec > log.txt &

This command will run my_exec independently on the 16 processors available on my node. This used to work perfectly.

Last week, the HPC department performed an OS upgrade and now, when launching the same command, I get two warning messages (for each processor). The first one is:

--------------------------------------------------------------------------                           
2 WARNING: It appears that your OpenFabrics subsystem is configured to only                            
3 allow registering part of your physical memory.  This can cause MPI jobs to                          
4 run with erratic performance, hang, and/or crash.                                                    
5                                                                                                      
6 This may be caused by your OpenFabrics vendor limiting the amount of                                 
7 physical memory that can be registered.  You should investigate the                                  
8 relevant Linux kernel module parameters that control how much physical                               
9 memory can be registered, and increase them to allow registering all                                 
10 physical memory on your machine.                                                                     
11                                                                                                      
12 See this Open MPI FAQ item for more information on these Linux kernel module                         
13 parameters:                                                                                          
14                                                                                                      
15     http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages                                
16                                                                                                      
17   Local host:              tamnun                                                                    
18   Registerable memory:     32768 MiB                                                                 
19   Total memory:            98294 MiB                                                                 
20                                                                                                      
21 Your MPI job will continue, but may be behave poorly and/or hang.                                    
22 --------------------------------------------------------------------------                           
23 --------------------------------------------------------------------------        

I then get an output from my code, which tells me it thinks I am launching only 1 realization of the code (Nprocs = 1 instead of 16).

177                                                                                                      
178 # MPI IS ON; Nprocs = 1                                                                              
179 Filename = ../input/odtParam.inp                                                                     
180                                                                                                      
181 # MPI IS ON; Nprocs = 1                                                                              
182                                                                                                      
183 ***** Error, process 0 failed to create ../data/data_0/, or it was already there

Finally, the second warning message is:

185 --------------------------------------------------------------------------                           
186 An MPI process has executed an operation involving a call to the                                     
187 "fork()" system call to create a child process.  Open MPI is currently                               
188 operating in a condition that could result in memory corruption or                                   
189 other system errors; your MPI job may hang, crash, or produce silent                                 
190 data corruption.  The use of fork() (or system() or other calls that                                 
191 create child processes) is strongly discouraged.                                                     
192                                                                                                      
193 The process that invoked fork was:                                                                   
194                                                                                                      
195   Local host:          tamnun (PID 17446)                                                            
196   MPI_COMM_WORLD rank: 0                                                                             
197                                                                                                      
198 If you are *absolutely sure* that your application will successfully                                 
199 and correctly survive a call to fork(), you may disable this warning                                 
200 by setting the mpi_warn_on_fork MCA parameter to 0.                                                  
201 --------------------------------------------------------------------------     

After looking around online, I tried following the warning messages' advice by setting the MCA parameter mpi_warn_on_fork to 0 with the command:

nohup mpirun --mca mpi_warn_on_fork 0 -np 16 ./my_exec > log.txt &

which yielded the following error message:

[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments

I am using RedHat 6.7 (Santiago). I contacted the HPC department, but since I am in a university, this issue may take them a day or two to respond. Any help or guidance would be appreciated.

EDIT in response to answer:

Indeed, I was compiling my code with Open MPI's mpic++ while running the executable with Intel's mpirun command, hence the error (after the OS upgrade Intel's mpirun was set as the default). I had to put the Open MPI's mpirun's path at the beginning of the $PATH environmental variable.

The code now runs as expected BUT I still get the first warning message above (it does not advise me to use the MCA parameter mpi_warn_on_fork anymore. I think (but not sure) it is an issue I need to resolve with the HPC department.


回答1:


[mpiexec@tamnun] match_arg (./utils/args/args.c:194): unrecognized argument mca
[mpiexec@tamnun] HYDU_parse_array (./utils/args/args.c:214): argument matching returned error
[mpiexec@tamnun] parse_args (./ui/mpich/utils.c:2964): error parsing input array
                                  ^^^^^
[mpiexec@tamnun] HYD_uii_mpx_get_parameters (./ui/mpich/utils.c:3238): unable to parse user arguments
                                                  ^^^^^

You are using MPICH in the last case. MPICH is not Open MPI and its process launcher does not recognize the --mca parameter that is specific to Open MPI (MCA stands for Modular Component Architecture - the basic framework that Open MPI is built upon). A typical case of a mix-up of multiple MPI implementations.



来源:https://stackoverflow.com/questions/33780992/mpirun-unrecognized-argument-mca

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!