I am trying to sort a large number of arrays in python. I need to perform the sorting for over 11 million arrays at once.
Also, it would be nice if I could directly get
Well for cases like those where you are interested in partial sorted indices, there's NumPy's argpartition.
You have the troublesome np.argsort
in : w[np.argsort(z)[::-1]][:7]
, which is essentially w[idx]
, where idx = np.argsort(z)[::-1][:7]
.
So, idx
could be calculated with np.argpartition
, like so -
idx = np.argpartition(-z,np.arange(7))[:7]
That -z
is needed because by default np.argpartition
tries to get sorted indices in ascending order. So, to reverse it, we have negated the elements.
Thus, the proposed change in the original code would be :
func = w[np.argpartition(-z,np.arange(7))[:7]]
Runtime test -
In [162]: z = np.random.randint(0,10000000,(1100000)) # Random int array
In [163]: idx1 = np.argsort(z)[::-1][:7]
...: idx2 = np.argpartition(-z,np.arange(7))[:7]
...:
In [164]: np.allclose(idx1,idx2) # Verify results
Out[164]: True
In [165]: %timeit np.argsort(z)[::-1][:7]
1 loops, best of 3: 264 ms per loop
In [166]: %timeit np.argpartition(-z,np.arange(7))[:7]
10 loops, best of 3: 36.5 ms per loop