I am working with large arrays representing a grid, each element is a Cell object with x,y attributes.
I am not sure the most efficient way to initialize the arrays, my
Define a simple class:
class Cell():
def __init__(self,x,y):
self.x=x
self.y=y
def setX(self,x):
self.x=x
def __repr__(self):
return f'Cell({self.x},{self.y})'
A way of creating an array of these objects:
In [653]: f = np.frompyfunc(Cell, 2, 1)
In [654]: arr = f(np.arange(3)[:,None], np.arange(4))
In [655]: arr
Out[655]:
array([[Cell(0,0), Cell(0,1), Cell(0,2), Cell(0,3)],
[Cell(1,0), Cell(1,1), Cell(1,2), Cell(1,3)],
[Cell(2,0), Cell(2,1), Cell(2,2), Cell(2,3)]], dtype=object)
In [656]: arr.shape
Out[656]: (3, 4)
A list way of creating the same objects:
In [658]: [[Cell(i,j) for i in range(3)] for j in range(4)]
Out[658]:
[[Cell(0,0), Cell(1,0), Cell(2,0)],
[Cell(0,1), Cell(1,1), Cell(2,1)],
[Cell(0,2), Cell(1,2), Cell(2,2)],
[Cell(0,3), Cell(1,3), Cell(2,3)]]
Some comparative timings:
In [659]: timeit arr = f(np.arange(3)[:,None], np.arange(4))
13.5 µs ± 73.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [660]: timeit [[Cell(i,j) for i in range(3)] for j in range(4)]
8.3 µs ± 115 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [661]: timeit arr = f(np.arange(300)[:,None], np.arange(400))
64.9 ms ± 293 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [662]: timeit [[Cell(i,j) for i in range(300)] for j in range(400)]
78 ms ± 2.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
For large sets, the frompyfunc
approach has a modest speed advantage.
Fetching the values from all cells:
In [664]: np.frompyfunc(lambda c: c.x, 1, 1)(arr)
Out[664]:
array([[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2]], dtype=object)
Using the SetX
method:
In [665]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(12).reshape(3,4))
Out[665]:
array([[None, None, None, None],
[None, None, None, None],
[None, None, None, None]], dtype=object)
In [666]: arr
Out[666]:
array([[Cell(0,0), Cell(1,1), Cell(2,2), Cell(3,3)],
[Cell(4,0), Cell(5,1), Cell(6,2), Cell(7,3)],
[Cell(8,0), Cell(9,1), Cell(10,2), Cell(11,3)]], dtype=object)
SetX
doesn't return anything, so the array produced by function call is all None
. But it has modified all elements of arr
. Like list comprehensions, we don't normally use frompyfunc
calls for side effects, but it is possible.
np.vectorize
, in it's default (and original) form, just uses frompyfunc
, and adjusts the dtype of the return. frompyfunc
always returns object dtype. Newer versions of vectorize
have a signature
parameter, allowing us to pass arrays (as opposed to scalars) to the function, and get back arrays. But this processing is even slower.
Defining array of objects like this may make your code look cleaner and better organized, but they can never match numeric numpy arrays in terms of speed.
Given the definition of Cell
I can set the attributes to arrays, e.g.
Cell(np.arange(3), np.zeros((3,4)))
But to set the values of an array of Cell, I have to construct an object array first:
In [676]: X = np.zeros(3, object)
In [677]: for i,row in enumerate(np.arange(6).reshape(3,2)): X[i]=row
In [678]: X
Out[678]: array([array([0, 1]), array([2, 3]), array([4, 5])], dtype=object)
In [679]: np.frompyfunc(Cell.setX, 2, 1)(arr, X[:,None])
Out[679]:
array([[None, None, None, None],
[None, None, None, None],
[None, None, None, None]], dtype=object)
In [680]: arr
Out[680]:
array([[Cell([0 1],0), Cell([0 1],1), Cell([0 1],2), Cell([0 1],3)],
[Cell([2 3],0), Cell([2 3],1), Cell([2 3],2), Cell([2 3],3)],
[Cell([4 5],0), Cell([4 5],1), Cell([4 5],2), Cell([4 5],3)]],
dtype=object)
I could not pass a (3,2) array:
In [681]: np.frompyfunc(Cell.setX, 2, 1)(arr, np.arange(6).reshape(3,2))
ValueError: operands could not be broadcast together with shapes (3,4) (3,2)
numpy
preferentially works with multidimensional (numeric) arrays. Creating and using object dtype array requires some special tricks.