np.concatenate a ND tensor/array with a 1D array

前端 未结 4 1624
南笙
南笙 2021-01-14 09:33

I have two arrays a & b

a.shape
(5, 4, 3)
array([[[ 0.        ,  0.        ,  0.        ],
        [ 0.        ,  0.        ,  0.        ],
        [ 0.          


        
相关标签:
4条回答
  • 2021-01-14 09:54

    You can also use np.insert.

    b_broad = np.expand_dims(b, axis=0) # b_broad.shape = (1, 3)
    ab = np.insert(a, 4, b_broad, axis=1)
    """ 
    Because now we are inserting along axis 1
         a'shape without axis 1 = (5, 3) 
         b_broad's shape          (1, 3)  
    can be aligned and broadcast b_broad to (5, 3)
    """
    

    In this example, we insert along the axis 1, and will put b_broad before the index given, 4 here. In other words, the b_broad will occupy index 4 at long the axis and make ab.shape equal (5, 5, 3).

    Note again that before we do insertion, we turn b into b_broad for safely achieve the right broadcasting you want. The dimension of b is smaller and there will be broadcasting at insertion. We can use expand_dims to achieve this goal.

    If a is of shape (3, 4, 5), you will need b_broad to have shape (3, 1) to match up dimensions if inserting along axis 1. This can be achieved by

    b_broad = np.expand_dims(b, axis=1)  # shape = (3, 1)
    

    It would be a good practice to make b_broad in a right shape because you might have a.shape = (3, 4, 3) and you really need to specify which way to broadcast in this case!

    Timing Results

    From OP's dataset: COLDSPEED's answer is 3 times faster.

    def Divakar():  # Divakar's answer
        b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
        r = np.concatenate((a, b3D), axis=1)
    # COLDSPEED's result
    %timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
    2.95 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    # Divakar's result
    %timeit Divakar()
    3.03 µs ± 173 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    # Mine's
    %timeit np.insert(a, 4, b, axis=1)
    10.1 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    

    Dataset 2 (Borrow the timing experiment from COLDSPEED): nothing can be concluded in this case because they share nearly the same mean and standard deviation.

    a = np.random.randn(100, 99, 100)
    b = np.random.randn(100)
    
    # COLDSPEED's result
    %timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1) 
    2.37 ms ± 194 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    # Divakar's
    %timeit Divakar()
    2.31 ms ± 249 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    # Mine's
    %timeit np.insert(a, 99, b, axis=1) 
    2.34 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    

    Speed will depend on data's size, shape, and volume. Please tested on you dataset if speed is your concern.

    0 讨论(0)
  • Here are some simple timings based on cᴏʟᴅsᴘᴇᴇᴅ's and Divakar's solutions:

    %timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
    

    Output: The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 3.68 µs per loop

    %timeit np.concatenate((a, np.broadcast_to(b[None,None], (a.shape[0], 1, len(b)))), axis=1)
    

    Output: The slowest run took 4.12 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 10.7 µs per loop

    Now here is the timing based on your original code:

    %timeit original_func(a, b)
    

    Output: The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached. 100000 loops, best of 3: 4.69 µs per loop

    Since the question asked for faster ways to come up with the same result, I would go for cᴏʟᴅsᴘᴇᴇᴅ's solution based on these problem calculations.

    0 讨论(0)
  • 2021-01-14 10:03

    Simply broadcast b to 3D and then concatenate along second axis -

    b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    out = np.concatenate((a,b3D),axis=1)
    

    The broadcasting part with np.broadcast_to doesn't actual replicate or make copies and is simply a replicated view and then in the next step, we do the concatenation that does the replication on-the-fly.

    Benchmarking

    We are comparing np.repeat version from @cᴏʟᴅsᴘᴇᴇᴅ's solution against np.broadcast_to one in this section with focus on performance. The broadcasting based one does the replication and concatenation in the second step, as a merged command so to speak, while np.repeat version makes copy and then concatenates in two separate steps.

    Timing the approaches as whole :

    Case #1 : a = (500,400,300) and b = (300,)

    In [321]: a = np.random.rand(500,400,300)
    
    In [322]: b = np.random.rand(300)
    
    In [323]: %%timeit
         ...: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
         ...: r = np.concatenate((a, b3D), axis=1)
    10 loops, best of 3: 72.1 ms per loop
    
    In [325]: %%timeit
         ...: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
         ...: out = np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72.5 ms per loop
    

    For smaller input shapes, call to np.broadcast_to would take a bit longer than np.repeat given the work needed for setting up the broadcasting is apparently more complicated, as the timings suggest below :

    In [360]: a = np.random.rand(5,4,3)
    
    In [361]: b = np.random.rand(3)
    
    In [366]: %timeit np.broadcast_to(b,(a.shape[0],1,len(b)))
    100000 loops, best of 3: 3.12 µs per loop
    
    In [367]: %timeit b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    1000000 loops, best of 3: 957 ns per loop
    

    But, the broadcasting part would have a constant time irrepective of the shapes of the inputs, i.e. the 3 u-sec part would stay around that mark. The timing for the counterpart : b.reshape(1, 1, -1).repeat(a.shape[0], axis=0) would depend on the input shapes. So, let's dig deeper and see how the concatenation steps for the two approaches fair/behave.

    Diging deeper

    Trying to dig deeper to see how much the concatenation part is consuming :

    In [353]: a = np.random.rand(500,400,300)
    
    In [354]: b = np.random.rand(300)
    
    In [355]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [356]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72 ms per loop
    
    In [357]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [358]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 72 ms per loop
    

    Conclusion : Doesn't seem too different.

    Now, let's try a case where the replication needed for b is a bigger number and b has noticeably high number of elements as well.

    In [344]: a = np.random.rand(10000, 10, 1000)
    
    In [345]: b = np.random.rand(1000)
    
    In [346]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [347]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 130 ms per loop
    
    In [348]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [349]: %timeit np.concatenate((a,b3D),axis=1)
    10 loops, best of 3: 141 ms per loop
    

    Conclusion : Seems like the merged concatenate+replication with np.broadcast_to is doing a bit better here.

    Let's try the original case of (5,4,3) shape :

    In [360]: a = np.random.rand(5,4,3)
    
    In [361]: b = np.random.rand(3)
    
    In [362]: b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    
    In [363]: %timeit np.concatenate((a,b3D),axis=1)
    1000000 loops, best of 3: 948 ns per loop
    
    In [364]: b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    In [365]: %timeit np.concatenate((a,b3D),axis=1)
    1000000 loops, best of 3: 950 ns per loop
    

    Conclusion : Again, not too different.

    So, the final conclusion is that if there are a lot of elements in b and if the first axis of a is also a big number (as the replication number is that one), np.broadcast_to would be a good option, otherwise np.repeat based version takes care of the other cases pretty well.

    0 讨论(0)
  • 2021-01-14 10:11

    You can use np.repeat:

    r = np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
    

    What this does, is first reshape your b array to match the dimensions of a, and then repeat its values as many times as needed according to a's first axis:

    b3D = b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)
    
    array([[[1, 2, 3]],
    
           [[1, 2, 3]],
    
           [[1, 2, 3]],
    
           [[1, 2, 3]],
    
           [[1, 2, 3]]])
    
    b3D.shape
    (5, 1, 3)
    

    This intermediate result is then concatenated with a -

    r = np.concatenate((a, b3d), axis=0)
    
    r.shape
    (5, 5, 3)
    

    This differs from your current answer mainly in the fact that the repetition of values is not hard-coded (i.e., it is taken care of by the repeat).

    If you need to handle this for a different number of dimensions (not 3D arrays), then some changes are needed (mainly in how remove the hardcoded reshape of b).


    Timings

    a = np.random.randn(100, 99, 100)
    b = np.random.randn(100)
    

    # Tai's answer
    %timeit np.insert(a, 4, b, axis=1)
    100 loops, best of 3: 3.7 ms per loop
    
    # Divakar's answer
    %%timeit 
    b3D = np.broadcast_to(b,(a.shape[0],1,len(b)))
    np.concatenate((a,b3D),axis=1)
    
    100 loops, best of 3: 3.67 ms per loop
    
    # solution in this post
    %timeit np.concatenate((a, b.reshape(1, 1, -1).repeat(a.shape[0], axis=0)), axis=1)
    100 loops, best of 3: 3.62 ms per loop
    

    These are all pretty competitive solutions. However, note that performance depends on your actual data, so make sure you test things first!

    0 讨论(0)
提交回复
热议问题