Segmentation faults occur when I run a parallel program with Open MPI

前端 未结 1 463
感动是毒
感动是毒 2021-02-07 12:15

on my previous post I needed to distribute data of pgm files among 10 computers. With help from Jonathan Dursi and Shawn Chin, I have integrate the code. I can compile my progra

1条回答
  •  渐次进展
    2021-02-07 12:49

    Congratulations; the code almost ran completely perfectly, it died on almost the final lines of code.

    The issue would have been a little clearer with valgrind, but you have to be trickier running valgrind with MPI -- or anything that involves a program launcher. Instead of:

    valgrind mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm

    which does a valgrind of mpirun, which you don't really care about, you want to do

    mpirun -np 10 valgrind ./exmpi_2 balloons.pgm output.pgm

    -- that is, you want to launch 10 valgrinds, each running one process' worth of exmpi_2. If you do that (and you've compiled with -g), you'll find towards the end, valgrind output like the following:

    ==6303==  Access not within mapped region at address 0x1
    ==6303==    at 0x387FA60C17: fclose@@GLIBC_2.2.5 (in /lib64/libc-2.5.so)
    ==6303==    by 0x401222: main (pgm.c:124)
    

    .. and that's all there is to it; you have all processes doing the fclose()s, when only one process has a handle to a fopen()ed file in the first place. Simply replacing

    fclose(FR);
    fclose(FW);
    

    with

    if (rank == IONODE) {
        fclose(FR);
        fclose(FW);
    }
    

    seems to work for me.

    0 讨论(0)
提交回复
热议问题