numba eager compilation? Whats the pattern?

问题

I looked into eager compilation on numba's website and couldnt figure out, how to specify the types:

The example they use is this:

from numba import jit, int32

@jit(int32(int32, int32))
def f(x, y):
    # A somewhat trivial example
    return x + y

# source: http://numba.pydata.org/numba-doc/latest/user/jit.html#eager-compilation

as you can see it gets 2 variables as input and returns one single variable. all of them should be int32.

One way to understand the decorator is that @jit(int32(int32, int32)) could be understood as:

@jit(type_of_returned_value(type_of_x, type_of_b))

If that is right (Is that right?), then how do you specify it for multiple inputs and outputs?

Like this:

@nb.jit
def filter3(a,b):
    return a > b 

@nb.jit
def func3(list_of_arrays_A, list_of_arrays_B, list_of_arrays_C, list_of_arrays_D, 2d_numpy_array_of_objects):

    for i in range(len(list_of_arrays_A)): 

        for j in range(list_of_arrays_A[i].size):
            
            if filter3(list_of_arrays_A[i][j],list_of_arrays_B[i][j]):
                2d_numpy_array_of_objects[i][j] = 1

            elif filter3(list_of_arrays_B[i][j],list_of_arrays_A[i][j]):
                2d_numpy_array_of_objects[i][j] = 0

            elif filter3(list_of_arrays_C[i][j],list_of_arrays_D[i][j]): 
                2d_numpy_array_of_objects[i][j] = 0
            else:                       
                2d_numpy_array_of_objects[i][j] = 1 
'''

My intention: Since i need to speed up a function which is only called once, (but takes forever if **not** done with numba), I need to speed up its numba-compilation

回答1:

One can always use numba.typeof to interfere type of a variable. e.g.

import numpy as np
import numba as nb
N=10000
simple_list= [np.zeros(1) for x in range(N)]
nb.typeof(simple_list)
# reflected list(array(float64, 1d, C))

or:

from numba.typed import List
typed_list=List()
for _ in range(N):
    typed_list.append(np.zeros(1))
nb.typeof(typed_list)
# ListType[array(float64, 1d, C)]

So you can provide the signatures as follows to ahead-of-time compilation:

@nb.jit([nb.void(nb.typeof(typed_list)),
         nb.void(nb.typeof(simple_list))])
def fun(lst):
    pass

Noteworthy details:

I'm compiling the function ahead of time in two different versions: one for numba's TypedList (nb.void(nb.typeof(typed_list)) and one for python's list (nb.void(nb.typeof(simple_list))).
I don't use the signature strings , but the signatures themselves (for example described here), because there exists no signature string for TypedList or reflected list if I understand it correctly (more info follows bellow).
As the function fun doesn't return anything, the return type of the function is void, thus nb.void(...) in signatures.

However, the interesting thing is how much more overhead simple_list-version has:

%timeit fun(simple_list)  # 185 ms ± 4.23 ms 
%timeit fun(typed_list)   # 1.18 µs ± 69.3 ns

i.e. factor of about 1e5! It is also pretty clear why: in order to check that the passed list is really of type reflected list(array(float64, 1d, C)) numba has to look at every element in the list. On the other hand, for the TypedList it is much simpler: there cannot be no more than one type in the list - there is no need to iterate over the whole list!

Thus one should prefer to create and use TypedLists and not just to dismiss a deprecation warning.

It is probably not possible to give a string for reflected list or TypedList, because right now the following code is used to parse the signature:

def _parse_signature_string(signature_str):
    # Just eval signature_str using the types submodules as globals
    return eval(signature_str, {}, types.__dict__)

and because nb.types.__dict__ has no TypedList or reflected list we cannot pass them via string.

Once the functions are compiled (ahead-of-time or just-in-time) it is possible to see the signatures in the corresponding Dispatcher-object, for example via:

[x.signature for x in fun.overloads.values()]
# [(ListType[array(float64, 1d, C)],) -> none,
#  (reflected list(array(float64, 2d, C)),) -> none]

This can be used to figure out the right return-type of the function (here none means void).

来源：https://stackoverflow.com/questions/63910687/numba-eager-compilation-whats-the-pattern

标签

python

numpy

compilation

vectorization

numba