Outer subtraction with Numpy

放肆的年华 提交于 2021-02-05 07:45:27

问题


I simply want to do: C_i=\Sum_k (A_i -B_k)^2 I saw that this calculation is faster with a simple for loop than with the numpy.subtract.outer! Anyway I feel that numpy.einsum would be the fastest. I could not understand numpy.einsum that well. Can anyone please help me out? Additionally, it would be great if someone explains how a general summation expression consisting of vector/matrices can be written with numpy.einsum?

I did not find solution for this particular problem on the web. Sorry if duplicate in some manner.

MWE with loop and numpy.subtract.outer--

A)With loop

import timeit
code1="""
import numpy as np

N=10000;

a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);

def A1(x,y):
    Nx=len(x)
    z=np.zeros(Nx)
    for i in np.arange(Nx):
        z[i]=np.sum((x[i]-y)*(x[i]-y))

    return z

C1=A1(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time

B) With numpy.subtract.outer

import timeit
code1="""
import numpy as np

N=10000;

a=np.random.rand(N); b=10*(np.random.rand(N)-0.5);

def A2(x,y):
    C=np.subtract.outer(x,y);
    return np.sum(C*C, axis=1)

C2=A2(a,b)"""
elapsed_time = timeit.timeit(code1, number=10)/10
print "time=", elapsed_time

For N=10000 the loop becomes faster. For N=100, the outer subtract becomes faster. For N=10^5, outer subtract faces memory issue on my desktop with 8GB ram!


回答1:


Use at least Numba, or a Fortran Implementation

Both of your functions are quite slow. Python loops are very inefficient (A1), and allocating large temporary arrays is also slow (A2 and partially also A1).

Naive Numba implementation for small arrays

import numba as nb
import numpy as np

@nb.njit(parallel=True, fastmath=True)
def A_nb_p(x,y):

    z=np.empty(x.shape[0])
    for i in nb.prange(x.shape[0]):
        TMP=0.
        for j in range(y.shape[0]):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    return z

Timings

import time
N=int(1e5)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)

t1=time.time()
res_1=A1(a,b)
print(time.time()-t1)
#95.30195426940918 s

t1=time.time()
res_2=A_nb_p(a,b)
print(time.time()-t1)
#0.28573083877563477 s

#A2 is too slow to measure

As written above this is a naive implementation on larger arrays, since Numba isn't able to do the calculation blockwise, which leads to a lot of cache misses and therefore bad performance. Some Fortran or C- compiler should be able to do at least the following optimization (block-wise evaluation) automatically.

Implementation for large arrays

@nb.njit(parallel=True, fastmath=True)
def A_nb_p_2(x,y):
    blk_s=1024
    z=np.zeros(x.shape[0])
    num_blk_x=x.shape[0]//blk_s
    num_blk_y=y.shape[0]//blk_s

    for ii in nb.prange(num_blk_x):
        for jj in range(num_blk_y):
            for i in range(blk_s):
                TMP=z[ii*blk_s+i]
                for j in range(blk_s):
                    TMP+=(x[ii*blk_s+i]-y[jj*blk_s+j])**2
                z[ii*blk_s+i]=TMP

    for i in nb.prange(x.shape[0]):
        TMP=z[i]
        for j in range(num_blk_y*blk_s,y.shape[0]):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    for i in nb.prange(num_blk_x*blk_s,x.shape[0]):
        TMP=z[i]
        for j in range(num_blk_y*blk_s):
            TMP+=(x[i]-y[j])**2
        z[i]=TMP

    return z

Timings

N=int(2*1e6)
a=np.random.rand(N)
b=10*(np.random.rand(N)-0.5)

t1=time.time()
res_1=A_nb_p(a,b)
print(time.time()-t1)
#298.9394392967224

t1=time.time()
res_2=A_nb_p_2(a,b)
print(time.time()-t1)
#70.12


来源:https://stackoverflow.com/questions/58177426/outer-subtraction-with-numpy

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!