问题
So... My question is simple.
Let's assume we have a master MPI process with a master_array of 6*6 cells:
Master
-----------
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
And that we have 4 worker MPI processes with worker_array of 3*3 cells.
Worker 1 | Worker 2 | Worker 3 | Worker 4 |
------- | ------- | ------- | ------- |
1 1 1 | 2 2 2 | 3 3 3 | 4 4 4 |
1 1 1 | 2 2 2 | 3 3 3 | 4 4 4 |
1 1 1 | 2 2 2 | 3 3 3 | 4 4 4 |
Now, I want to send the worker arrays to the master array like this:
Master
-----------
1 1 1 2 2 2
1 1 1 2 2 2
1 1 1 2 2 2
3 3 3 4 4 4
3 3 3 4 4 4
3 3 3 4 4 4
How do I end up with this using some-kind-of-MPI-send/receive, MPI_datatypes or MPI_vectors or MPI_subarrays or MPI_whatever-does-the-trick?
I hope you get my point.
Answers with detailed and working code will be deeply appreciated.
回答1:
Here is a working code that uses both point-to-point and collectives (the collective version is commented out below but works OK). You need to define a vector type to correspond to the non-contiguous data at the receive side on the master. To use a collective gather, you need to mess about with the size of this vector to ensure gather puts all the pieces in the correct place and you need to use the gatherv version.
It's easy to get the array indices messed up, so for generality I have used a 2x3 array of processes on a 6x12 matrix so that things are deliberately not square.
Apologies for the messed-up indentation - I seem to have tab/space issues that I really ought to sort out in the future!
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define M 6
#define N 12
#define MP 2
#define NP 3
#define MLOCAL (M/MP)
#define NLOCAL (N/NP)
#define TAG 0
int main(void)
{
int master[M][N];
int local[MLOCAL][NLOCAL];
MPI_Comm comm = MPI_COMM_WORLD;
int rank, size, src;
int i, j;
int istart, jstart;
int displs[MP*NP], counts[MP*NP];
MPI_Status status;
MPI_Request request;
MPI_Datatype block, blockresized;
MPI_Init(NULL, NULL);
MPI_Comm_size(comm, &size);
MPI_Comm_rank(comm, &rank);
if (size != MP*NP)
{
if (rank == 0) printf("Size %d not equal to MP*NP = %d\n", size, MP*NP);
MPI_Finalize();
return 1;
}
for (i=0; i < M; i++)
{
for (j=0; j < N; j++)
{
master[i][j] = rank;
}
}
for (i=0; i < MLOCAL; i++)
{
for (j=0; j < NLOCAL; j++)
{
local[i][j] = rank+1;
}
}
// Define vector type appropriate for subsections of master array
MPI_Type_vector(MLOCAL, NLOCAL, N, MPI_INT, &block);
MPI_Type_commit(&block);
// Non-blocking send to avoid deadlock with rank 0 sending to itself
MPI_Isend(local, MLOCAL*NLOCAL, MPI_INTEGER, 0, TAG, comm, &request);
// Receive from all the workers
if (rank == 0)
{
for (src=0; src < size; src++)
{
// Find out where this block should go
istart = (src/NP) * MLOCAL;
jstart = (src%NP) * NLOCAL;
// receive a single block
MPI_Recv(&master[istart][jstart], 1, block, src, TAG, comm, &status);
}
}
// Wait for send to complete
MPI_Wait(&request, &status);
/* comment out collective
// Using collectives -- currently commented out!
MPI_Type_create_resized(block, 0, sizeof(int), &blockresized);
MPI_Type_commit(&blockresized);
// Work out displacements in master in counts of integers
for (src=0; src < size; src++)
{
istart = (src/NP) * MLOCAL;
jstart = (src%NP) * NLOCAL;
displs[src] = istart*N + jstart;
counts[src] = 1;
}
// Call collective
MPI_Gatherv(local, MLOCAL*NLOCAL, MPI_INT,
master, counts, displs, blockresized,
0, comm);
*/
// Print out
if (rank == 0)
{
for (i=0; i < M; i++)
{
for (j=0; j < N; j++)
{
printf("%d ", master[i][j]);
}
printf("\n");
}
}
MPI_Finalize();
}
It seems to work OK on 6 processes:
mpiexec -n 6 ./arraygather
1 1 1 1 2 2 2 2 3 3 3 3
1 1 1 1 2 2 2 2 3 3 3 3
1 1 1 1 2 2 2 2 3 3 3 3
4 4 4 4 5 5 5 5 6 6 6 6
4 4 4 4 5 5 5 5 6 6 6 6
4 4 4 4 5 5 5 5 6 6 6 6
This should work in any situation where the matrix decomposes exactly onto the process grid. It'll be a bit more complicated if the processes do not all have exactly the same size of sub-matrix.
来源:https://stackoverflow.com/questions/39927229/c-mpi-send-receive-subarrays-to-array