问题
I'm trying to do matrix multiplication using MPI in C and we have to do a version that sequential and one parallel version. My parallel version is not giving the correct answers and I'm not sure why. I think I'm not sending the right communications to the processes but I can't be sure. The professor just went over the different send/receive/gather etc messages, but didn't really get into much detail... I've seen a lot of different examples but none complete and none using scatter/gather. If anyone can take a look at my code and tell me if anything pops out at them I'd appreciate it. I'm pretty sure my problem is in the scatter/gather messages or the actual calculation of the c matrix.
#define N 512
#include <stdio.h>
#include <math.h>
#include <sys/time.h>
#include <stdlib.h>
#include <stddef.h>
#include "mpi.h"
print_results(char *prompt, float a[N][N]);
int main(int argc, char *argv[])
{
int i, j, k, rank, size, tag = 99, blksz, sum = 0;
float a[N][N], b[N][N], c[N][N];
char *usage = "Usage: %s file\n";
FILE *fd;
double elapsed_time, start_time, end_time;
struct timeval tv1, tv2;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (argc < 2) {
fprintf (stderr, usage, argv[0]);
return -1;
}
if ((fd = fopen (argv[1], "r")) == NULL) {
fprintf (stderr, "%s: Cannot open file %s for reading.\n",
argv[0], argv[1]);
fprintf (stderr, usage, argv[0]);
return -1;
}
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
fscanf (fd, "%f", &a[i][j]);
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
fscanf (fd, "%f", &b[i][j]);
MPI_Barrier(MPI_COMM_WORLD);
gettimeofday(&tv1, NULL);
MPI_Scatter(a, N*N/size, MPI_INT, a, N*N/size, MPI_INT, 0,
MPI_COMM_WORLD);
MPI_Bcast(b, N*N, MPI_INT, 0, MPI_COMM_WORLD);
if (rank != 0) {
for (i = 0; i < N; i++)
{
for (j = 0; j < N; j++)
{
for (k = 0; k < N; k++)
{
sum = sum + a[i][k] * b[k][j];
}
c[i][j] = sum;
sum = 0;
}
}
}
MPI_Gather(c, N*N/size, MPI_INT, c, N*N/size, MPI_INT, 0,
MPI_COMM_WORLD);
MPI_Finalize();
gettimeofday(&tv2, NULL);
elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec)/1000000.0);
printf ("elapsed_time=\t%lf (seconds)\n", elapsed_time);
print_results("C = ", c);
}
print_results(char *prompt, float a[N][N])
{
int i, j;
printf ("\n\n%s\n", prompt);
for (i = 0; i < N; i++) {
for (j = 0; j < N; j++) {
printf(" %.2f", a[i][j]);
}
printf ("\n");
}
printf ("\n\n");
}
updated part of code:
for (i=0;i<size; i++)
{
if (rank == i)
{
for (i = rank*(N/size); i < (rank*(N/size)+(N/size)); i++)
{
for (j = rank*(N/size); j < (rank*(N/size)+(N/size)); j++)
{
for (k = rank*N; k < rank*N+N; k++)
{
sum = sum + a[i][k] * b[k][j];
}
c[i][j] = sum;
sum = 0;
}
}
}
}
回答1:
A first problem in your code is that size
might not divide N
. Which means that scattering size
packets of length N*N/size
does not necessarily send the whole matrix. This is probably the hardest point to get right.
As Greg Inozemtsev points out, a second problem is that you exclude process 0 from the computation, although it is responsible for a part of the matrix.
And yet another problem is that all I/O operations (reading the coefficients at the beginning and outputting the results at the end) should be done only by process 0.
On another note, you should specify the return type (void
in this case) of your print_result
function, both in the forward declaration and in the definition.
来源:https://stackoverflow.com/questions/12453134/mpi-matrix-multiplication-with-scatter-gather