Most efficient way to set attributes on objects in array

后端 未结 1 1173
失恋的感觉
失恋的感觉 2021-01-27 14:34

I am working with large arrays representing a grid, each element is a Cell object with x,y attributes.

I am not sure the most efficient way to initialize the arrays, my

相关标签:
1条回答
  • 2021-01-27 15:15

    Define a simple class:

    class Cell():
        def __init__(self,x,y):
            self.x=x
            self.y=y
        def setX(self,x):
            self.x=x
        def __repr__(self):
            return f'Cell({self.x},{self.y})'
    

    A way of creating an array of these objects:

    In [653]: f = np.frompyfunc(Cell, 2, 1)
    In [654]: arr = f(np.arange(3)[:,None], np.arange(4))
    In [655]: arr
    Out[655]: 
    array([[Cell(0,0), Cell(0,1), Cell(0,2), Cell(0,3)],
           [Cell(1,0), Cell(1,1), Cell(1,2), Cell(1,3)],
           [Cell(2,0), Cell(2,1), Cell(2,2), Cell(2,3)]], dtype=object)
    In [656]: arr.shape
    Out[656]: (3, 4)
    

    A list way of creating the same objects:

    In [658]: [[Cell(i,j) for i in range(3)] for j in range(4)]
    Out[658]: 
    [[Cell(0,0), Cell(1,0), Cell(2,0)],
     [Cell(0,1), Cell(1,1), Cell(2,1)],
     [Cell(0,2), Cell(1,2), Cell(2,2)],
     [Cell(0,3), Cell(1,3), Cell(2,3)]]
    

    Some comparative timings:

    In [659]: timeit arr = f(np.arange(3)[:,None], np.arange(4))
    13.5 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    In [660]: timeit [[Cell(i,j) for i in range(3)] for j in range(4)]
    8.3 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    In [661]: timeit arr = f(np.arange(300)[:,None], np.arange(400))
    64.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    In [662]: timeit [[Cell(i,j) for i in range(300)] for j in range(400)]
    78 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
    

    For large sets, the frompyfunc approach has a modest speed advantage.

    Fetching the values from all cells:

    In [664]: np.frompyfunc(lambda c: c.x, 1, 1)(arr)
    Out[664]: 
    array([[0, 0, 0, 0],
           [1, 1, 1, 1],
           [2, 2, 2, 2]], dtype=object)
    

    Using the SetX method:

    In [665]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(12).reshape(3,4))
    Out[665]: 
    array([[None, None, None, None],
           [None, None, None, None],
           [None, None, None, None]], dtype=object)
    In [666]: arr
    Out[666]: 
    array([[Cell(0,0), Cell(1,1), Cell(2,2), Cell(3,3)],
           [Cell(4,0), Cell(5,1), Cell(6,2), Cell(7,3)],
           [Cell(8,0), Cell(9,1), Cell(10,2), Cell(11,3)]], dtype=object)
    

    SetX doesn't return anything, so the array produced by function call is all None. But it has modified all elements of arr. Like list comprehensions, we don't normally use frompyfunc calls for side effects, but it is possible.

    np.vectorize, in it's default (and original) form, just uses frompyfunc, and adjusts the dtype of the return. frompyfunc always returns object dtype. Newer versions of vectorize have a signature parameter, allowing us to pass arrays (as opposed to scalars) to the function, and get back arrays. But this processing is even slower.

    Defining array of objects like this may make your code look cleaner and better organized, but they can never match numeric numpy arrays in terms of speed.


    Given the definition of Cell I can set the attributes to arrays, e.g.

    Cell(np.arange(3), np.zeros((3,4)))
    

    But to set the values of an array of Cell, I have to construct an object array first:

    In [676]: X = np.zeros(3, object)
    In [677]: for i,row in enumerate(np.arange(6).reshape(3,2)): X[i]=row
    In [678]: X
    Out[678]: array([array([0, 1]), array([2, 3]), array([4, 5])], dtype=object)
    In [679]: np.frompyfunc(Cell.setX, 2, 1)(arr, X[:,None])
    Out[679]: 
    array([[None, None, None, None],
           [None, None, None, None],
           [None, None, None, None]], dtype=object)
    In [680]: arr
    Out[680]: 
    array([[Cell([0 1],0), Cell([0 1],1), Cell([0 1],2), Cell([0 1],3)],
           [Cell([2 3],0), Cell([2 3],1), Cell([2 3],2), Cell([2 3],3)],
           [Cell([4 5],0), Cell([4 5],1), Cell([4 5],2), Cell([4 5],3)]],
          dtype=object)
    

    I could not pass a (3,2) array:

    In [681]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(6).reshape(3,2))
    ValueError: operands could not be broadcast together with shapes (3,4) (3,2) 
    

    numpy preferentially works with multidimensional (numeric) arrays. Creating and using object dtype array requires some special tricks.

    0 讨论(0)
提交回复
热议问题