Fastest way to populate a matrix with a function on pairs of elements in two numpy vectors?

前端 未结 3 1941
无人及你
无人及你 2021-01-14 14:52

I have two 1 dimensional numpy vectors va and vb which are being used to populate a matrix by passing all pair combinations to a function.

相关标签:
3条回答
  • 2021-01-14 15:09

    Like @shx2 said, it all depends on what is foo. If you can express it in terms of numpy ufuncs, then use outer method:

    In [11]: N = 400
    
    In [12]: B = np.empty((N, N))
    
    In [13]: x = np.random.random(N)
    
    In [14]: y = np.random.random(N)
    
    In [15]: %%timeit
    for i in range(N):
       for j in range(N):
         B[i, j] = x[i] - y[j]
       ....: 
    10 loops, best of 3: 87.2 ms per loop
    
    In [16]: %timeit A = np.subtract.outer(x, y)   # <--- np.subtract is a ufunc
    1000 loops, best of 3: 294 µs per loop
    

    Otherwise you can push the looping down to cython level. Continuing a trivial example above:

    In [45]: %%cython
    cimport cython
    @cython.boundscheck(False)
    @cython.wraparound(False)
    def foo(double[::1] x, double[::1] y, double[:, ::1] out):
        cdef int i, j
        for i in xrange(x.shape[0]):
            for j in xrange(y.shape[0]):
                out[i, j] = x[i] - y[j]
       ....: 
    
    In [46]: foo(x, y, B)
    
    In [47]: np.allclose(B, np.subtract.outer(x, y))
    Out[47]: True
    
    In [48]: %timeit foo(x, y, B)
    10000 loops, best of 3: 149 µs per loop
    

    The cython example is deliberately made overly simplistic: in reality you might want to add some shape/stride checks, allocate the memory within your function etc.

    0 讨论(0)
  • 2021-01-14 15:12

    cdist is fast because it is written in highly-optimized C code (as you already pointed out), and it only supports a small predefined set of metrics.

    Since you want to apply the operation generically, to any given foo function, you have no choice but to call that function na-times-nb times. That part is not likely to be further optimizable.

    What's left to optimize are the loops and the indexing. Some suggestions to try out:

    1. Use xrange instead of range (if in python2.x. in python3, range is already a generator-like)
    2. Use enumerate, instead of range + explicitly indexing
    3. Use a python speed "magic", such as cython or numba, to speed up the looping process.

    If you can make further assumptions about foo, it might be possible to speed it up further.

    0 讨论(0)
  • 2021-01-14 15:17

    One of the least known numpy functions for what the docs call functional programming routines is np.frompyfunc. This creates a numpy ufunc from a Python function. Not some other object that closely simulates a numpy ufunc, but a proper ufunc with all its bells and whistles. While the behavior is in many aspects very similar to np.vectorize, it has some distinct advantages, that hopefully the following code should highlight:

    In [2]: def f(a, b):
       ...:     return a + b
       ...:
    
    In [3]: f_vec = np.vectorize(f)
    
    In [4]: f_ufunc = np.frompyfunc(f, 2, 1)  # 2 inputs, 1 output
    
    In [5]: a = np.random.rand(1000)
    
    In [6]: b = np.random.rand(2000)
    
    In [7]: %timeit np.add.outer(a, b)  # a baseline for comparison
    100 loops, best of 3: 9.89 ms per loop
    
    In [8]: %timeit f_vec(a[:, None], b)  # 50x slower than np.add
    1 loops, best of 3: 488 ms per loop
    
    In [9]: %timeit f_ufunc(a[:, None], b)  # ~20% faster than np.vectorize...
    1 loops, best of 3: 425 ms per loop
    
    In [10]: %timeit f_ufunc.outer(a, b)  # ...and you get to use ufunc methods
    1 loops, best of 3: 427 ms per loop
    

    So while it is still clearly inferior to a properly vectorized implementation, it is a little faster (the looping is in C, but you still have the Python function call overhead).

    0 讨论(0)
提交回复
热议问题