Fastest way to populate a matrix with a function on pairs of elements in two numpy vectors?

前端未结

关注

 3  1941

I have two 1 dimensional numpy vectors va and vb which are being used to populate a matrix by passing all pair combinations to a function.

相关标签:

3条回答

时光取名叫无心

2021-01-14 15:09

Like @shx2 said, it all depends on what is foo. If you can express it in terms of numpy ufuncs, then use outer method:

In [11]: N = 400

In [12]: B = np.empty((N, N))

In [13]: x = np.random.random(N)

In [14]: y = np.random.random(N)

In [15]: %%timeit
for i in range(N):
   for j in range(N):
     B[i, j] = x[i] - y[j]
   ....: 
10 loops, best of 3: 87.2 ms per loop

In [16]: %timeit A = np.subtract.outer(x, y)   # <--- np.subtract is a ufunc
1000 loops, best of 3: 294 µs per loop

Otherwise you can push the looping down to cython level. Continuing a trivial example above:

In [45]: %%cython
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def foo(double[::1] x, double[::1] y, double[:, ::1] out):
    cdef int i, j
    for i in xrange(x.shape[0]):
        for j in xrange(y.shape[0]):
            out[i, j] = x[i] - y[j]
   ....: 

In [46]: foo(x, y, B)

In [47]: np.allclose(B, np.subtract.outer(x, y))
Out[47]: True

In [48]: %timeit foo(x, y, B)
10000 loops, best of 3: 149 µs per loop

The cython example is deliberately made overly simplistic: in reality you might want to add some shape/stride checks, allocate the memory within your function etc.

0 讨论(0)

不思量自难忘°

2021-01-14 15:12
cdist is fast because it is written in highly-optimized C code (as you already pointed out), and it only supports a small predefined set of metrics.

Since you want to apply the operation generically, to any given foo function, you have no choice but to call that function na-times-nb times. That part is not likely to be further optimizable.

What's left to optimize are the loops and the indexing. Some suggestions to try out:
1. Use xrange instead of range (if in python2.x. in python3, range is already a generator-like)
2. Use enumerate, instead of range + explicitly indexing
3. Use a python speed "magic", such as cython or numba, to speed up the looping process.
If you can make further assumptions about foo, it might be possible to speed it up further.
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2021-01-14 15:17
One of the least known numpy functions for what the docs call functional programming routines is np.frompyfunc. This creates a numpy ufunc from a Python function. Not some other object that closely simulates a numpy ufunc, but a proper ufunc with all its bells and whistles. While the behavior is in many aspects very similar to np.vectorize, it has some distinct advantages, that hopefully the following code should highlight:
```
In [2]: def f(a, b):
   ...:     return a + b
   ...:

In [3]: f_vec = np.vectorize(f)

In [4]: f_ufunc = np.frompyfunc(f, 2, 1)  # 2 inputs, 1 output

In [5]: a = np.random.rand(1000)

In [6]: b = np.random.rand(2000)

In [7]: %timeit np.add.outer(a, b)  # a baseline for comparison
100 loops, best of 3: 9.89 ms per loop

In [8]: %timeit f_vec(a[:, None], b)  # 50x slower than np.add
1 loops, best of 3: 488 ms per loop

In [9]: %timeit f_ufunc(a[:, None], b)  # ~20% faster than np.vectorize...
1 loops, best of 3: 425 ms per loop

In [10]: %timeit f_ufunc.outer(a, b)  # ...and you get to use ufunc methods
1 loops, best of 3: 427 ms per loop
```
So while it is still clearly inferior to a properly vectorized implementation, it is a little faster (the looping is in C, but you still have the Python function call overhead).
0 讨论(0)
发布评论:

提交评论
- 加载中...