Most efficient way to set attributes on objects in array

后端未结

关注

 1  1181

失恋的感觉

I am working with large arrays representing a grid, each element is a Cell object with x,y attributes.

I am not sure the most efficient way to initialize the arrays, my

相关标签:

1条回答

被撕碎了的回忆

2021-01-27 15:15

Define a simple class:

class Cell():
    def __init__(self,x,y):
        self.x=x
        self.y=y
    def setX(self,x):
        self.x=x
    def __repr__(self):
        return f'Cell({self.x},{self.y})'

A way of creating an array of these objects:

In [653]: f = np.frompyfunc(Cell, 2, 1)
In [654]: arr = f(np.arange(3)[:,None], np.arange(4))
In [655]: arr
Out[655]: 
array([[Cell(0,0), Cell(0,1), Cell(0,2), Cell(0,3)],
       [Cell(1,0), Cell(1,1), Cell(1,2), Cell(1,3)],
       [Cell(2,0), Cell(2,1), Cell(2,2), Cell(2,3)]], dtype=object)
In [656]: arr.shape
Out[656]: (3, 4)

A list way of creating the same objects:

In [658]: [[Cell(i,j) for i in range(3)] for j in range(4)]
Out[658]: 
[[Cell(0,0), Cell(1,0), Cell(2,0)],
 [Cell(0,1), Cell(1,1), Cell(2,1)],
 [Cell(0,2), Cell(1,2), Cell(2,2)],
 [Cell(0,3), Cell(1,3), Cell(2,3)]]

Some comparative timings:

In [659]: timeit arr = f(np.arange(3)[:,None], np.arange(4))
13.5 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [660]: timeit [[Cell(i,j) for i in range(3)] for j in range(4)]
8.3 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [661]: timeit arr = f(np.arange(300)[:,None], np.arange(400))
64.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [662]: timeit [[Cell(i,j) for i in range(300)] for j in range(400)]
78 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

For large sets, the frompyfunc approach has a modest speed advantage.

Fetching the values from all cells:

In [664]: np.frompyfunc(lambda c: c.x, 1, 1)(arr)
Out[664]: 
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2]], dtype=object)

Using the SetX method:

In [665]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(12).reshape(3,4))
Out[665]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [666]: arr
Out[666]: 
array([[Cell(0,0), Cell(1,1), Cell(2,2), Cell(3,3)],
       [Cell(4,0), Cell(5,1), Cell(6,2), Cell(7,3)],
       [Cell(8,0), Cell(9,1), Cell(10,2), Cell(11,3)]], dtype=object)

SetX doesn't return anything, so the array produced by function call is all None. But it has modified all elements of arr. Like list comprehensions, we don't normally use frompyfunc calls for side effects, but it is possible.

np.vectorize, in it's default (and original) form, just uses frompyfunc, and adjusts the dtype of the return. frompyfunc always returns object dtype. Newer versions of vectorize have a signature parameter, allowing us to pass arrays (as opposed to scalars) to the function, and get back arrays. But this processing is even slower.

Defining array of objects like this may make your code look cleaner and better organized, but they can never match numeric numpy arrays in terms of speed.

Given the definition of Cell I can set the attributes to arrays, e.g.

Cell(np.arange(3), np.zeros((3,4)))

But to set the values of an array of Cell, I have to construct an object array first:

In [676]: X = np.zeros(3, object)
In [677]: for i,row in enumerate(np.arange(6).reshape(3,2)): X[i]=row
In [678]: X
Out[678]: array([array([0, 1]), array([2, 3]), array([4, 5])], dtype=object)
In [679]: np.frompyfunc(Cell.setX, 2, 1)(arr, X[:,None])
Out[679]: 
array([[None, None, None, None],
       [None, None, None, None],
       [None, None, None, None]], dtype=object)
In [680]: arr
Out[680]: 
array([[Cell([0 1],0), Cell([0 1],1), Cell([0 1],2), Cell([0 1],3)],
       [Cell([2 3],0), Cell([2 3],1), Cell([2 3],2), Cell([2 3],3)],
       [Cell([4 5],0), Cell([4 5],1), Cell([4 5],2), Cell([4 5],3)]],
      dtype=object)

I could not pass a (3,2) array:

In [681]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(6).reshape(3,2))
ValueError: operands could not be broadcast together with shapes (3,4) (3,2)

numpy preferentially works with multidimensional (numeric) arrays. Creating and using object dtype array requires some special tricks.

0 讨论(0)