How to share work roughly evenly between processes in MPI despite the array_size not being cleanly divisible by the number of processes?

前端 未结 7 1553
借酒劲吻你
借酒劲吻你 2021-01-04 10:25

Hi all, I have an array of length N, and I\'d like to divide it as best as possible between \'size\' processors. N/size has a remainder, e.g. 1000 array elements divided b

相关标签:
7条回答
  • 2021-01-04 10:51

    I had a similar problem, and here is my non optimum solution with Python and mpi4py API. An optimum solution would take into account how the processors are laid out, here extra work is ditributed to lower ranks. The uneven workload only differ by one task, so it should not be a big deal in general.

    from mpi4py import MPI
    import sys
    def get_start_end(comm,N):
        """
        Distribute N consecutive things (rows of a matrix , blocks of a 1D array)
        as evenly as possible over a given communicator.
        Uneven workload (differs by 1 at most) is on the initial ranks.
    
        Parameters
        ----------
        comm: MPI communicator
        N:  int
        Total number of things to be distributed.
    
        Returns
        ----------
        rstart: index of first local row
        rend: 1 + index of last row
    
        Notes
        ----------
        Index is zero based.
        """
    
        P      = comm.size
        rank   = comm.rank
        rstart = 0
        rend   = N
        if P >= N:
            if rank < N:
                rstart = rank
                rend   = rank + 1
            else:
                rstart = 0
                rend   = 0
        else:
            n = N//P # Integer division PEP-238
            remainder = N%P
            rstart    = n * rank
            rend      = n * (rank+1)
            if remainder:
                if rank >= remainder:
                    rstart += remainder
                    rend   += remainder
                else:
                    rstart += rank
                    rend   += rank + 1
        return rstart, rend
    
    if __name__ == '__main__':
        comm = MPI.COMM_WORLD
        n = int(sys.argv[1])
        print(comm.rank,get_start_end(comm,n))
    
    0 讨论(0)
  • 2021-01-04 10:57

    Here's a closed-form solution.

    Let N = array length and P = number of processors.

    From j = 0 to P-1,

    Starting point of array on processor j = floor(N * j / P)

    Length of array on processor j = floor(N * (j + 1) / P) – floor(N * j / P)

    0 讨论(0)
  • 2021-01-04 11:00

    I think that the best solution is to write yourself a little function for splitting work across processes evenly enough. Here's some pseudo-code, I'm sure you can write C (is that C in your question ?) better than I can.

    function split_evenly_enough(num_steps, num_processes)
        return = repmat(0, num_processes)  ! pseudo-Matlab for an array of num_processes 0s
        steps_per_process = ceiling(num_steps/num_processes)
        return = steps_per_process - 1 ! set all elements of the return vector to this number
        return(1:mod(num_steps, num_processes)) = steps_per_process  ! some processes have 1 more step
    end
    
    0 讨论(0)
  • 2021-01-04 11:00

    How about this?

    int* distribute(int total, int processes) {
        int* distribution = new int[processes];
        int last = processes - 1;        
    
        int remaining = total;
        int process = 0;
    
        while (remaining != 0) {
            ++distribution[process];
            --remaining;
    
            if (process != last) {
                ++process;
            }
            else {
                process = 0;
            }
        }
    
        return distribution;
    }
    

    The idea is that you assign an element to the first process, then an element to the second process, then an element to the third process, and so on, jumping back to the first process whenever the last one is reached.

    This method works even when the number of processes is greater than the number of elements. It uses only very simple operations and should therefore be very fast.

    0 讨论(0)
  • 2021-01-04 11:02

    Consider your "1000 steps and 7 processes" example.

    • simple division won't work because integer division (in C) gives you the floor, and you are left with some remainder: i.e. 1000 / 7 is 142, and there will be 6 doodads hanging out

    • ceiling division has the opposite problem: ceil(1000/7) is 143, but then the last processor overruns the array, or ends up with less to do than the others.

    You are asking for a scheme to evenly distribute the remainder over processors. Some processes should have 142, others 143. There must be a more formal approach but considering the attention this question's gotten in the last six months maybe not.

    Here's my approach. Every process needs to do this algorithm, and just pick out the answer it needs for itself.

    #include <mpi.h>
    #include <stdio.h>
    #include <stdlib.h>
    
    int main (int argc, char ** argv)
    {
    #define NR_ITEMS 1000
        int i, rank, nprocs;;
        int *bins;
    
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
        bins = calloc(nprocs, sizeof(int));
    
        int nr_alloced = 0;
        for (i=0; i<nprocs; i++) {
            remainder = NR_ITEMS - nr_alloced;
            buckets = (nprocs - i);
            /* if you want the "big" buckets up front, do ceiling division */
            bins[i] = remainder / buckets;
            nr_alloced += bins[i];
        }
    
        if (rank == 0)
            for (i=0; i<nprocs; i++) printf("%d ", bins[i]);
    
        MPI_Finalize();
        return 0;
    }
    
    0 讨论(0)
  • 2021-01-04 11:12

    I know this is long sense gone but a simple way to do this is to give each process the floor of the (number of items) / (number of processes) + (1 if process_num < num_items mod num_procs). In python, an array with work counts:

    # Number of items
    NI=128
    # Number of processes
    NP=20
    
    # Items per process
    [NI/NP + (1 if P < NI%NP else 0)for P in range(0,NP)]
    
    0 讨论(0)
提交回复
热议问题