Python - Multi processing to mount an array

后端 未结 3 1751
故里飘歌
故里飘歌 2021-01-26 10:58

I m using griddata to \"mount\" array with a great number of shapes and i would like to know if i can calculate functions (on each slice) on each my 4 cores in order to accelera

相关标签:
3条回答
  • 2021-01-26 11:30

    Three things:

    1. The most important question is why are you doing this?.
    2. Your NumPy build may already be making use of multiple cores. I am not sure off the top of my head how to check, see questions like this or if absolutely necessary take a look at the Numexpr library https://github.com/pydata/numexpr
    3. About the "Y" in your likely XY problem - you are re-calculating data that you can instead re-use:

    .

    import numpy
    
    size = 8
    Y=(arange(2000))
    X=(arange(2000))
    (xx,yy)=meshgrid(X,Y)
    
    array = zeros((Y.shape[0], X.shape[0], size))
    
    array[..., 0] = 0    
    for i in range(1, size):
        array[..., 1] = X ** i + Y ** i + array[..., i - 1]
    
    0 讨论(0)
  • 2021-01-26 11:37

    You can try with multiprocessing.Pool :

    from multiprocessing import Pool
    import numpy as np
    
    size = 8.
    Y=(np.arange(2000))
    X=(np.arange(2000))
    (xx,yy)=np.meshgrid(X,Y)
    
    array=np.zeros((Y.shape[0],X.shape[0],size))
    
    def func(i): # you need to call a function with Pool
        array_=np.zeros((Y.shape[0],X.shape[0]))
        for j in range(1,i):
            array_+=X**j+Y**j
        return array_
    
    if __name__ == '__main__':
        p = Pool(4) # if you have 4 cores in your processor
        result=p.map(func, range(1,8))
        for i in range(1,8):
            array[::,::,i]=result[i-1]
    

    Keep in mind that multiprocessing in python does not share memory, that's why you have to create the array_ and add the for-loop at the end of the code. As your application (with these dimensions) doesn't need a lot of computing time, it is possible that you will be slower with this method. Also you will create multiple copies of all your variables, wich may cause a memory overflow. You should also double-check the func I wrote, as I didn't completely verify that it does what it is supposed to do :)

    0 讨论(0)
  • 2021-01-26 11:37

    If you want to apply a single function over an array of data, then using e.g. a multiprocessing.Pool is a good solution, provided that both the input and output of the calculation are relatively small.

    You want to do many different calculations to two input arrays, which results in an array being returned for every one of those calculations.

    Since separate processes do not share memory, the X and Y arrays have to be transported to each worker process when it is are started. And the result of each calculation (which is also a numpy array the same size as X and Y) has to be returned to the parent process.

    Depending on e.g. the size of the arrays and the amount of cores, the overhead from the transfer of all those array between worker processes and the parent process via interprocess communication ("IPC") will cost time, reducing the advantages of using multiple cores.

    Keep in mind that the parent process has to listen for and handle IPC requests from all the worker processes. So you've shifted the bottleneck from calculation to communication.

    So it is not a given that multiprocessing will actually improve performance in this case. It depends on the details of the actual problem (number of cores, array size, amount of physical memory et cetera).

    You will have to do some careful performance measurements using e.g. Pool or Process with realistic array sizes.

    0 讨论(0)
提交回复
热议问题