I have two arrays a & b
a.shape
(5, 4, 3)
array([[[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0.
You can also use np.insert.
b_broad = np.expand_dims(b, axis=0) # b_broad.shape = (1, 3)
ab = np.insert(a, 4, b_broad, axis=1)
"""
Because now we are inserting along axis 1
a'shape without axis 1 = (5, 3)
b_broad's shape (1, 3)
can be aligned and broadcast b_broad to (5, 3)
"""
In this example, we insert along the axis 1, and will put b_broad
before the index given, 4 here. In other words, the b_broad
will occupy index 4 at long the axis and make ab.shape
equal (5, 5, 3)
.
Note again that before we do insertion, we turn b
into b_broad
for safely achieve the right broadcasting you want. The dimension of b
is smaller and there will be broadcasting at insertion. We can use expand_dims
to achieve this goal.
If a
is of shape (3, 4, 5)
, you will need b_broad
to have shape (3, 1)
to match up dimensions if inserting along axis 1. This can be achieved by
b_broad = np.expand_dims(b, axis=1) # shape = (3, 1)
It would be a good practice to make b_broad
in a right shape because you might have a.shape = (3, 4, 3)
and you really need to specify which way to broadcast in this case!
Timing Results
From OP's dataset: COLDSPEED's answer is 3 times faster.
def Divakar(): # Divakar's answer
b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
r = np.concatenate((a, b3D), axis=1)
# COLDSPEED's result
%timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
2.95 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Divakar's result
%timeit Divakar()
3.03 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# Mine's
%timeit np.insert(a, 4, b, axis=1)
10.1 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Dataset 2 (Borrow the timing experiment from COLDSPEED): nothing can be concluded in this case because they share nearly the same mean and standard deviation.
a = np.random.randn(100, 99, 100)
b = np.random.randn(100)
# COLDSPEED's result
%timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
2.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Divakar's
%timeit Divakar()
2.31 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# Mine's
%timeit np.insert(a, 99, b, axis=1)
2.34 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Speed will depend on data's size, shape, and volume. Please tested on you dataset if speed is your concern.
Here are some simple timings based on cᴏʟᴅsᴘᴇᴇᴅ's and Divakar's solutions:
%timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
Output: The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 3.68 µs per loop
%timeit np.concatenate((a, np.broadcast_to(b[None,None], (a.shape[0], 1, len(b)))), axis=1)
Output: The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 10.7 µs per loop
Now here is the timing based on your original code:
%timeit original_func(a, b)
Output: The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 4.69 µs per loop
Since the question asked for faster ways to come up with the same result, I would go for cᴏʟᴅsᴘᴇᴇᴅ's solution based on these problem calculations.
Simply broadcast b
to 3D
and then concatenate along second axis -
b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
out = np.concatenate((a,b3D),axis=1)
The broadcasting
part with np.broadcast_to
doesn't actual replicate or make copies and is simply a replicated view and then in the next step, we do the concatenation that does the replication on-the-fly.
We are comparing np.repeat version from @cᴏʟᴅsᴘᴇᴇᴅ's solution against np.broadcast_to
one
in this section with focus on performance. The broadcasting based one does the replication and concatenation in the second step, as a merged command so to speak, while np.repeat
version makes copy and then concatenates in two separate steps.
Timing the approaches as whole :
Case #1 : a = (500,400,300)
and b = (300,)
In [321]: a = np.random.rand(500,400,300)
In [322]: b = np.random.rand(300)
In [323]: %%timeit
...: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
...: r = np.concatenate((a, b3D), axis=1)
10 loops, best of 3: 72.1 ms per loop
In [325]: %%timeit
...: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
...: out = np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72.5 ms per loop
For smaller input shapes, call to np.broadcast_to
would take a bit longer than np.repeat
given the work needed for setting up the broadcasting is apparently more complicated, as the timings suggest below :
In [360]: a = np.random.rand(5,4,3)
In [361]: b = np.random.rand(3)
In [366]: %timeit np.broadcast_to(b,(a.shape[0],1,len(b)))
100000 loops, best of 3: 3.12 µs per loop
In [367]: %timeit b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
1000000 loops, best of 3: 957 ns per loop
But, the broadcasting part would have a constant time irrepective of the shapes of the inputs, i.e. the 3 u-sec
part would stay around that mark. The timing for the counterpart : b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
would depend on the input shapes. So, let's dig deeper and see how the concatenation steps for the two approaches fair/behave.
Diging deeper
Trying to dig deeper to see how much the concatenation part is consuming :
In [353]: a = np.random.rand(500,400,300)
In [354]: b = np.random.rand(300)
In [355]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
In [356]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72 ms per loop
In [357]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
In [358]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 72 ms per loop
Conclusion : Doesn't seem too different.
Now, let's try a case where the replication needed for b
is a bigger number and b
has noticeably high number of elements as well.
In [344]: a = np.random.rand(10000, 10, 1000)
In [345]: b = np.random.rand(1000)
In [346]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
In [347]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 130 ms per loop
In [348]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
In [349]: %timeit np.concatenate((a,b3D),axis=1)
10 loops, best of 3: 141 ms per loop
Conclusion : Seems like the merged concatenate+replication with np.broadcast_to
is doing a bit better here.
Let's try the original case of (5,4,3)
shape :
In [360]: a = np.random.rand(5,4,3)
In [361]: b = np.random.rand(3)
In [362]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
In [363]: %timeit np.concatenate((a,b3D),axis=1)
1000000 loops, best of 3: 948 ns per loop
In [364]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
In [365]: %timeit np.concatenate((a,b3D),axis=1)
1000000 loops, best of 3: 950 ns per loop
Conclusion : Again, not too different.
So, the final conclusion is that if there are a lot of elements in b
and if the first axis of a
is also a big number (as the replication number is that one), np.broadcast_to
would be a good option, otherwise np.repeat
based version takes care of the other cases pretty well.
You can use np.repeat
:
r = np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
What this does, is first reshape your b
array to match the dimensions of a
, and then repeat its values as many times as needed according to a
's first axis:
b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
array([[[1, 2, 3]],
[[1, 2, 3]],
[[1, 2, 3]],
[[1, 2, 3]],
[[1, 2, 3]]])
b3D.shape
(5, 1, 3)
This intermediate result is then concatenated with a
-
r = np.concatenate((a, b3d), axis=0)
r.shape
(5, 5, 3)
This differs from your current answer mainly in the fact that the repetition of values is not hard-coded (i.e., it is taken care of by the repeat).
If you need to handle this for a different number of dimensions (not 3D arrays), then some changes are needed (mainly in how remove the hardcoded reshape of b
).
Timings
a = np.random.randn(100, 99, 100)
b = np.random.randn(100)
# Tai's answer
%timeit np.insert(a, 4, b, axis=1)
100 loops, best of 3: 3.7 ms per loop
# Divakar's answer
%%timeit
b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
np.concatenate((a,b3D),axis=1)
100 loops, best of 3: 3.67 ms per loop
# solution in this post
%timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
100 loops, best of 3: 3.62 ms per loop
These are all pretty competitive solutions. However, note that performance depends on your actual data, so make sure you test things first!