Element-wise broadcasting for comparing two NumPy arrays?

后端 未结 3 807
被撕碎了的回忆
被撕碎了的回忆 2021-01-19 04:54

Let\'s say I have an array like this:

import numpy as np

base_array = np.array([-13, -9, -11, -3, -3, -4,   2,  2,
                         2,  5,   7,  7,          


        
相关标签:
3条回答
  • 2021-01-19 05:25

    You will want to transpose one of the arrays for broadcasting to work correctly. When you broadcast two arrays together, the dimensions are lined up and any unit dimensions are effectively expanded to the non-unit size that they match. So two arrays of size (16, 1) (the original array) and (1, 26) (the comparison array) would broadcast to (16, 26).

    Don't forget to sum across the dimension of size 16:

    (base_array[:, None] > comparison_array).sum(axis=1)
    

    None in a slice is equivalent to np.newaxis: it's one of many ways to insert a new unit dimension at the specified index. The reason that you don't need to do comparison_array[None, :] is that broadcasting lines up the highest dimensions, and fills in the lowest with ones automatically.

    0 讨论(0)
  • 2021-01-19 05:29

    You can simply add a dimension to the comparison array, so that the comparison is "stretched" across all values along the new dimension.

    >>> np.sum(comparison_array[:, None] < base_array)
    228
    

    This is the fundamental principle with broadcasting, and works for all kinds of operations.

    If you need the sum done along an axis, you just specify the axis along which you want to sum after the comparison.

    >>> np.sum(comparison_array[:, None] < base_array, axis=1)
    array([15, 15, 14, 14, 13, 13, 13, 13, 13, 12, 10, 10, 10, 10, 10,  7,  7,
            7,  6,  6,  3,  2,  2,  2,  1,  0])
    
    0 讨论(0)
  • 2021-01-19 05:36

    Here's one with np.searchsorted with focus on memory efficiency and hence performance -

    def get_comparative_sum(base_array, comparison_array):
        n = len(base_array)
        base_array_sorted = np.sort(base_array)
        idx = np.searchsorted(base_array_sorted, comparison_array, 'right')
        idx[idx==n] = n-1
        return n - idx - (base_array_sorted[idx] == comparison_array)
    

    Timings -

    In [40]: np.random.seed(0)
        ...: base_array = np.random.randint(-1000,1000,(10000))
        ...: comparison_array = np.random.randint(-1000,1000,(20000))
    
    # @miradulo's soln
    In [41]: %timeit np.sum(comparison_array[:, None] < base_array, axis=1)
    1 loop, best of 3: 386 ms per loop
    
    In [42]: %timeit get_comparative_sum(base_array, comparison_array)
    100 loops, best of 3: 2.36 ms per loop
    
    0 讨论(0)
提交回复
热议问题