In my parallel program, there was a big matrix. Each process computed and stored a part of it. Then the program wrote the matrix to a file by letting each process wrote its
If you are writing MPI, why not MPI-IO? Each process will call MPI_File_set_view to set a subarray view of the file, then each process can collectively write the data with MPI_FILE_WRITE_ALL . This approach is likely to scale really well on big machines (though your approach will be fine up to oh, maybe 100 processors.)