Are there good ways to \"expand\" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more ele
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack
shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_
...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate
and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.
there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
You should use np.column_stack
or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_
and np.hstack
[append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate
[that is a even a bit faster than append
]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad
says, "Take the array a
and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant
specified by constant_values
".
There are the index tricks r_
and c_
.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate
rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize
and np.ndarray.resize
, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!