问题
numpy.take can be applied in 2 dimensions with
np.take(np.take(T,ix,axis=0), iy,axis=1 )
I tested the stencil of the discret 2-dimensional Laplacian
ΔT = T[ix-1,iy] + T[ix+1, iy] + T[ix,iy-1] + T[ix,iy+1] - 4 * T[ix,iy]
with 2 take-schemes and the usual numpy.array scheme. The functions p and q are introduced for a leaner code writing and adress the axis 0 and 1 in different order. This is the code:
nx = 300; ny= 300
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.linspace(1,nx-2,nx-2,dtype=int)
iy = np.linspace(1,ny-2,ny-2,dtype=int)
#------------------------------------------------------------
def p(Φ,kx,ky):
return np.take(np.take(Φ,ky,axis=1), kx,axis=0 )
#------------------------------------------------------------
def q(Φ,kx,ky):
return np.take(np.take(Φ,kx,axis=0), ky,axis=1 )
#------------------------------------------------------------
%timeit ΔT_n = T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_t = p(T,ix-1,iy) + p(T,ix+1,iy) + p(T,ix,iy-1) + p(T,ix,iy+1) - 4.0 * p(T,ix,iy)
%timeit ΔT_t = q(T,ix-1,iy) + q(T,ix+1,iy) + q(T,ix,iy-1) + q(T,ix,iy+1) - 4.0 * q(T,ix,iy)
.
1000 loops, best of 3: 944 µs per loop
100 loops, best of 3: 3.11 ms per loop
100 loops, best of 3: 2.02 ms per loop
The results seem to be obvious:
- usual numpy index arithmeitk is fastest
- take-scheme q takes 100% longer (= C-ordering ?)
- take-scheme p takes 200% longer (= Fortran-ordering ?)
Not even the 1-dimensional example of the scipy manual indicates that numpy.take is fast:
a = np.array([4, 3, 5, 7, 6, 8])
indices = [0, 1, 4]
%timeit np.take(a, indices)
%timeit a[indices]
.
The slowest run took 6.58 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 4.32 µs per loop
The slowest run took 7.34 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.87 µs per loop
Does anybody has experiences how to make numpy.take fast ? It would be an flexible and attractive way for lean code writing that is fast in coding and
is told to be fast in execution as well. Thank your for some hints to improve my approach !
回答1:
The indexed version might be cleaned up with slice objects like this:
T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] + T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 4.0 * T[1:nx-1,1:ny-1]
sy1 = slice(1,ny-1)
sx1 = slice(1,nx-1)
sy2 = slice(2,ny)
sy_2 = slice(0,ny-2)
T[0:nx-2,sy1] + T[2:nx,sy1] + T[sx1,xy_2] + T[sx1,sy2] - 4.0 * T[sx1,sy1]
回答2:
Thanks @Divakar and @hpaulj ! Yes, working with slice
is viable too. Comparing all 4 approaches gives:
- fastest ex aequo: t(
usual np
) and t(slice
) - t(
take
) = 2 * t(slice
) - t(
ix_
) = 3 * t(slice
)
Here the code and the results:
import numpy as np
from numpy import ix_ as r
nx = 500; ny = 500
T = np.arange(nx*ny).reshape(nx, ny)
ix = np.arange(1,nx-1);
iy = np.arange(1,ny-1);
jx = slice(1,nx-1); jxm = slice(0,nx-2); jxp = slice(2,nx)
jy = slice(1,ny-1); jym = slice(0,ny-2); jyp = slice(2,ny)
#------------------------------------------------------------
def p(U,kx,ky):
return np.take(np.take(U,kx, axis=0), ky,axis=1)
#------------------------------------------------------------
%timeit ΔT_slice= -T[jxm,jy] + T[jxp,jy] - T[jx,jym] + T[jx,jyp] - 0.0 * T[jx,jy]
%timeit ΔT_npy = -T[0:nx-2,1:ny-1] + T[2:nx,1:ny-1] - T[1:nx-1,0:ny-2] + T[1:nx-1,2:ny] - 0.0 * T[1:nx-1,1:ny-1]
%timeit ΔT_take = -p(T,ix-1,iy) + p(T,ix+1,iy) - p(T,ix,iy-1) + p(T,ix,iy+1) - 0.0 * p(T,ix,iy)
%timeit ΔT_ix_ = -T[r(ix-1,iy)] + T[r(ix+1,iy)] - T[r(ix,iy-1)] + T[r(ix,iy+1)] - 0.0 * T[r(ix,iy)]
.
100 loops, best of 3: 3.14 ms per loop
100 loops, best of 3: 3.13 ms per loop
100 loops, best of 3: 7.03 ms per loop
100 loops, best of 3: 9.58 ms per loop
Concerning the discussion about view and copy the following might be instructive:
print("if False --> a view ; if True --> a copy" )
print("_slice_ :", T[jx,jy].base is None)
print("_npy_ :", T[1:nx-1,1:ny-1].base is None)
print("_take_ :", p(T,ix,iy).base is None)
print("_ix_ :", T[r(ix,iy)].base is None)
.
if False --> a view ; if True --> a copy
_slice_ : False
_npy_ : False
_take_ : True
_ix_ : True
来源:https://stackoverflow.com/questions/45290102/is-2-dimensional-numpy-take-fast