Why Numba doesn't improve this recursive function

后端 未结 3 2104
有刺的猬
有刺的猬 2021-01-26 06:28

I have an array of true/false values with a very simple structure:

# the real array has hundreds of thousands of items
pos         


        
3条回答
  •  [愿得一人]
    2021-01-26 07:08

    The main issue is that you are not performing an apple-to-apple comparison. What you provide is not an iterative and a recursive version of the same algorithm. You are proposing two fundamentally different algorithms, which happen to be recursive/iterative.

    In particular you are using NumPy built-ins a lot more in the recursive approach, so no wonder that there is such a staggering difference in the two approaches. It should also come at no surprise that the Numba JITting is more effective when you are avoiding NumPy built-ins. Eventually, the recursive algorithm seems to be less efficient as there is some hidden nested looping in the np.all() and np.any() calls that the iterative approach is avoiding, so even if you were to write all your code in pure Python to be accelerated with Numba more effectively, the recursive approach would be slower.

    In general, iterative approaches are faster then the recursive equivalent, because they avoid the function call overhead (which is minimal for JIT accelerated functions compared to pure Python ones). So I would advise against trying to rewrite the algorithm in recursive form, only to discover that it is slower.


    EDIT

    On the premises that a simple np.diff() would do the trick, Numba can still be quite beneficial:

    import numpy as np
    import numba as nb
    
    
    @nb.jit
    def diff(arr):
        n = arr.size
        result = np.empty(n - 1, dtype=arr.dtype)
        for i in range(n - 1):
            result[i] = arr[i + 1] ^ arr[i]
        return result
    
    
    positions = np.random.randint(0, 2, size=300_000, dtype=bool)
    print(np.allclose(np.diff(positions), diff(positions)))
    # True
    
    
    %timeit np.diff(positions)
    # 1000 loops, best of 3: 603 µs per loop
    %timeit diff(positions)
    # 10000 loops, best of 3: 43.3 µs per loop
    

    with the Numba approach being some 13x faster (in this test, mileage may vary, of course).

提交回复
热议问题