Efficiently counting number of unique elements - NumPy / Python

后端 未结 3 2107
遥遥无期
遥遥无期 2021-01-03 02:07

When running np.unique(), it first flattens the array, sorts the array, then finds the unique values. When I have arrays with shape (10, 3000, 3000), it takes about a second

3条回答
  •  时光说笑
    2021-01-03 02:25

    We could leverage the fact that the elements are restricted to uint8 range by binned-counting with np.bincount and then simply count the number of non-zeros in it. Since, np.bincount expects a 1D array, we would flatten the input with np.ravel() and then feed it to bincount.

    Hence, the implementation would be -

    (np.bincount(a.ravel())!=0).sum()
    

    Runtime test

    Helper function to create input array with various number of unique numbers -

    def create_input(n_unique):
        unq_nums = np.random.choice(np.arange(256), n_unique,replace=0)
        return np.random.choice(unq_nums, (10,3000,3000)).astype(np.uint8)
    

    Other approach(es) :

    # @Warren Weckesser's soln
    def assign_method(a):
        q = np.zeros(256, dtype=int)
        q[a.ravel()] = 1
        return len(np.nonzero(q)[0])
    

    Verification of proposed method -

    In [141]: a = create_input(n_unique=120)
    
    In [142]: len(np.unique(a))
    Out[142]: 120
    
    In [143]: (np.bincount(a.ravel())!=0).sum()
    Out[143]: 120
    

    Timings -

    In [124]: a = create_input(n_unique=128)
    
    In [125]: %timeit len(np.unique(a)) # Original soln
         ...: %timeit assign_method(a)  # @Warren Weckesser's soln
         ...: %timeit (np.bincount(a.ravel())!=0).sum()
         ...: 
    1 loop, best of 3: 3.09 s per loop
    1 loop, best of 3: 394 ms per loop
    1 loop, best of 3: 209 ms per loop
    
    In [126]: a = create_input(n_unique=256)
    
    In [127]: %timeit len(np.unique(a)) # Original soln
         ...: %timeit assign_method(a)  # @Warren Weckesser's soln
         ...: %timeit (np.bincount(a.ravel())!=0).sum()
         ...: 
    1 loop, best of 3: 3.46 s per loop
    1 loop, best of 3: 378 ms per loop
    1 loop, best of 3: 212 ms per loop
    

提交回复
热议问题