Ranking of numpy array with possible duplicates

前端未结

关注

 3  2127

你的背包

I have a numpy array of floats/ints and want to map its elements into their ranks.

If an array doesn\'t have duplicates the problem can be solved by the following co

相关标签:

3条回答

借酒劲吻你

2021-01-17 22:27

Here is a function that can return the output you desire (in the first case)

def argsortdup(a1):
  sorted = sort(a1)
  ranked = []
  for item in a1:
    ranked.append(sorted.searchsorted(item))
  return array(ranked)

Basically you sort it and then you search for the index the item is at. Assuming duplicates the first instance index should be returned. I tested it with your a2 example and doing something like

a3 = argsortdup(a2)

Yields

array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])

"Test with a2":

>>> a2
array([ 0.1,  1.1,  2.1,  3.1,  4.1,  1.1,  6.1,  7.1,  7.1,  1.1])
>>> def argsortdup(a1):
...   sorted = sort(a1)
...   ranked = []
...   for item in a1:
...     ranked.append(sorted.searchsorted(item))
...   return array(ranked)
...
>>> a3 = argsortdup(a2)
>>> a2
array([ 0.1,  1.1,  2.1,  3.1,  4.1,  1.1,  6.1,  7.1,  7.1,  1.1])
>>> a3
array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])
>>>

0 讨论(0)

猫巷女王i

2021-01-17 22:41

You can do reasonably well using unique and bincount:

>>> u, v = np.unique(a2, return_inverse=True)
>>> (np.cumsum(np.bincount(v)) - 1)[v]
array([0, 3, 4, 5, 6, 3, 7, 9, 9, 3])

Or, for the minimum rank:

>>> (np.cumsum(np.concatenate(([0], np.bincount(v)))))[v]
array([0, 1, 4, 5, 6, 1, 7, 8, 8, 1])

There's a minor speedup by giving bincount the number of bins to provide:

(np.cumsum(np.bincount(v, minlength=u.size)) - 1)[v]

0 讨论(0)

不思量自难忘°

2021-01-17 22:44

After upgrading to a latest version of scipy as suggested @WarrenWeckesser in the comments, scipy.stats.rankdata seems to be faster than both scipy.stats.mstats.rankdata and np.searchsorted being the fastet way to do it on larger arrays.

In [1]: import numpy as np

In [2]: from scipy.stats import rankdata as rd
   ...: from scipy.stats.mstats import rankdata as rd2
   ...: 

In [3]: array = np.arange(0.1, 1000000.1)

In [4]: %timeit np.searchsorted(np.sort(array), array)
1 loops, best of 3: 385 ms per loop

In [5]: %timeit rd(array)
10 loops, best of 3: 109 ms per loop

In [6]: %timeit rd2(array)
1 loops, best of 3: 205 ms per loop

0 讨论(0)