Using mpi4py to parallelize a 'for' loop on a compute cluster

前端 未结 1 1927
有刺的猬
有刺的猬 2021-02-10 18:20

I haven\'t worked with distributed computing before, but I\'m trying to integrate mpi4py into a program in order to parallelize a for loop on a compute cluster.

This is

1条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-10 19:03

    In order to achieve parallelism of a for loop with MPI4Py check the code example below. Its just a for loop to add some numbers. The for loop will execute in every node. Every node will get a different chunk of data to work with (range in for loop). In the end Node with rank zero will add the results from all the nodes.

    #!/usr/bin/python
    
    import numpy
    from mpi4py import MPI
    import time
    
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    size = comm.Get_size()
    
    a = 1
    b = 1000000
    
    perrank = b//size
    summ = numpy.zeros(1)
    
    comm.Barrier()
    start_time = time.time()
    
    temp = 0
    for i in range(a + rank*perrank, a + (rank+1)*perrank):
        temp = temp + i
    
    summ[0] = temp
    
    if rank == 0:
        total = numpy.zeros(1)
    else:
        total = None
    
    comm.Barrier()
    #collect the partial results and add to the total sum
    comm.Reduce(summ, total, op=MPI.SUM, root=0)
    
    stop_time = time.time()
    
    if rank == 0:
        #add the rest numbers to 1 000 000
        for i in range(a + (size)*perrank, b+1):
            total[0] = total[0] + i
        print ("The sum of numbers from 1 to 1 000 000: ", int(total[0]))
        print ("time spent with ", size, " threads in milliseconds")
        print ("-----", int((time.time()-start_time)*1000), "-----")
    

    In order to execute the code above you should run it like this:

    $ qsub -q qexp -l select=4:ncpus=16:mpiprocs=16:ompthreads=1 -I # Salomon: ncpus=24:mpiprocs=24
    $ ml Python
    $ ml OpenMPI
    $ mpiexec -bycore -bind-to-core python hello_world.py

    In this example, we run MPI4Py enabled code on 4 nodes, 16 cores per node (total of 64 processes), each python process is bound to a different core.

    Sources that may help you:
    Submit job with python code (mpi4py) on HPC cluster
    https://github.com/JordiCorbilla/mpi4py-examples/tree/master/src/examples/matrix%20multiplication

    0 讨论(0)
提交回复
热议问题