count occurrences of arrays in multidimensional arrays in python

后端 未结 5 1684
醉酒成梦
醉酒成梦 2020-12-20 18:11

I have the following type of arrays:

a = array([[1,1,1],
           [1,1,1],
           [1,1,1],
           [2,2,2],
           [2,2,2],
           [2,2,2],
         


        
相关标签:
5条回答
  • 2020-12-20 18:43

    Since numpy-1.13.0, np.unique can be used with axis argument:

    >>> np.unique(a, axis=0, return_counts=True)
    
    (array([[1, 1, 1],
            [2, 2, 2],
            [3, 3, 0]]), array([3, 3, 3]))
    
    0 讨论(0)
  • 2020-12-20 18:47

    collections.Counter can do this conveniently, and almost like the example given.

    >>> from collections import Counter
    >>> c = Counter()
    >>> for x in a:
    ...   c[tuple(x)] += 1
    ...
    >>> c
    Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
    

    This converts each sub-list to a tuple, which can be keys in a dictionary since they are immutable. Lists are mutable so can't be used as dict keys.

    Why do you want to avoid using for loops?

    And similar to @padraic-cunningham's much cooler answer:

    >>> Counter(tuple(x) for x in a)
    Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
    >>> Counter(map(tuple, a))
    Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
    
    0 讨论(0)
  • 2020-12-20 18:50

    If you don't mind mapping to tuples just to get the count you can use a Counter dict which runs in 28.5 µs on my machine using python3 which is well below your threshold:

    In [5]: timeit Counter(map(tuple, a))
    10000 loops, best of 3: 28.5 µs per loop
    
    In [6]: c = Counter(map(tuple, a))
    
    In [7]: c
    Out[7]: Counter({(2, 2, 2): 3, (1, 1, 1): 3, (3, 3, 0): 3})
    
    0 讨论(0)
  • 2020-12-20 18:51

    You could convert those rows to a 1D array using the elements as two-dimensional indices with np.ravel_multi_index. Then, use np.unique to give us the positions of the start of each unique row and also has an optional argument return_counts to give us the counts. Thus, the implementation would look something like this -

    def unique_rows_counts(a):
    
        # Calculate linear indices using rows from a
        lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
    
        # Get the unique indices and their counts
        _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
    
        # return the unique groups from a and their respective counts
        return a[unq_idx], counts
    

    Sample run -

    In [64]: a
    Out[64]: 
    array([[1, 1, 1],
           [1, 1, 1],
           [1, 1, 1],
           [2, 2, 2],
           [2, 2, 2],
           [2, 2, 2],
           [3, 3, 0],
           [3, 3, 0],
           [3, 3, 0]])
    
    In [65]: unqrows, counts = unique_rows_counts(a)
    
    In [66]: unqrows
    Out[66]: 
    array([[1, 1, 1],
           [2, 2, 2],
           [3, 3, 0]])
    In [67]: counts
    Out[67]: array([3, 3, 3])
    

    Benchmarking

    Assuming you are okay with either numpy arrays or collections as outputs, one can benchmark the solutions provided thus far, like so -

    Function definitions:

    import numpy as np
    from collections import Counter
    
    def unique_rows_counts(a):
        lidx = np.ravel_multi_index(a.T,a.max(0)+1 )
        _, unq_idx, counts = np.unique(lidx, return_index = True, return_counts=True)
        return a[unq_idx], counts
    
    def map_Counter(a):
        return Counter(map(tuple, a))    
    
    def forloop_Counter(a):      
        c = Counter()
        for x in a:
            c[tuple(x)] += 1
        return c
    

    Timings:

    In [53]: a = np.random.randint(0,4,(10000,5))
    
    In [54]: %timeit map_Counter(a)
    10 loops, best of 3: 31.7 ms per loop
    
    In [55]: %timeit forloop_Counter(a)
    10 loops, best of 3: 45.4 ms per loop
    
    In [56]: %timeit unique_rows_counts(a)
    1000 loops, best of 3: 1.72 ms per loop
    
    0 讨论(0)
  • 2020-12-20 18:59

    The numpy_indexed package (disclaimer: I am its author) contains efficient vectorized functionality for these kind of operations:

    import numpy_indexed as npi
    unique_rows, row_count = npi.count(a, axis=0)
    

    Note that this works for arrays of any dimensionality or datatype.

    0 讨论(0)
提交回复
热议问题