Numpy argsort - what is it doing?

后端 未结 9 2353
礼貌的吻别
礼貌的吻别 2020-11-28 19:15

Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I\'d expect it to do this:

相关标签:
9条回答
  • 2020-11-28 19:29

    As the documentation says, argsort:

    Returns the indices that would sort an array.

    That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

    What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.

    0 讨论(0)
  • 2020-11-28 19:32

    [2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

    There are a number of ways to get the result you are looking for:

    import numpy as np
    import scipy.stats as stats
    
    def using_indexed_assignment(x):
        "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
        result = np.empty(len(x), dtype=int)
        temp = x.argsort()
        result[temp] = np.arange(len(x))
        return result
    
    def using_rankdata(x):
        return stats.rankdata(x)-1
    
    def using_argsort_twice(x):
        "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
        return np.argsort(np.argsort(x))
    
    def using_digitize(x):
        unique_vals, index = np.unique(x, return_inverse=True)
        return np.digitize(x, bins=unique_vals) - 1
    

    For example,

    In [72]: x = np.array([1.48,1.41,0.0,0.1])
    
    In [73]: using_indexed_assignment(x)
    Out[73]: array([3, 2, 0, 1])
    

    This checks that they all produce the same result:

    x = np.random.random(10**5)
    expected = using_indexed_assignment(x)
    for func in (using_argsort_twice, using_digitize, using_rankdata):
        assert np.allclose(expected, func(x))
    

    These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

    In [50]: x = np.random.random(10**5)
    In [66]: %timeit using_indexed_assignment(x)
    100 loops, best of 3: 9.32 ms per loop
    
    In [70]: %timeit using_rankdata(x)
    100 loops, best of 3: 10.6 ms per loop
    
    In [56]: %timeit using_argsort_twice(x)
    100 loops, best of 3: 16.2 ms per loop
    
    In [59]: %timeit using_digitize(x)
    10 loops, best of 3: 27 ms per loop
    

    For small arrays, using_argsort_twice may be faster:

    In [78]: x = np.random.random(10**2)
    
    In [81]: %timeit using_argsort_twice(x)
    100000 loops, best of 3: 3.45 µs per loop
    
    In [79]: %timeit using_indexed_assignment(x)
    100000 loops, best of 3: 4.78 µs per loop
    
    In [80]: %timeit using_rankdata(x)
    100000 loops, best of 3: 19 µs per loop
    
    In [82]: %timeit using_digitize(x)
    10000 loops, best of 3: 26.2 µs per loop
    

    Note also that stats.rankdata gives you more control over how to handle elements of equal value.

    0 讨论(0)
  • 2020-11-28 19:36

    First, it was ordered the array. Then generate an array with the initial index of the array.

    0 讨论(0)
  • 2020-11-28 19:36

    It returns indices according to the given array indices,[1.48,1.41,0.0,0.1],that means: 0.0 is the first element, in index [2]. 0.1 is the second element, in index[3]. 1.41 is the third element, in index [1]. 1.48 is the fourth element, in index[0]. Output:

    [2,3,1,0]
    
    0 讨论(0)
  • 2020-11-28 19:37

    According to the documentation

    Returns the indices that would sort an array.

    • 2 is the index of 0.0.
    • 3 is the index of 0.1.
    • 1 is the index of 1.41.
    • 0 is the index of 1.48.
    0 讨论(0)
  • 2020-11-28 19:38

    input:
    import numpy as np
    x = np.array([1.48,1.41,0.0,0.1])
    x.argsort().argsort()

    output:
    array([3, 2, 0, 1])

    0 讨论(0)
提交回复
热议问题