Fastest way to zero out low values in array?

后端 未结 9 1497
忘掉有多难
忘掉有多难 2020-12-24 07:07

So, lets say I have 100,000 float arrays with 100 elements each. I need the highest X number of values, BUT only if they are greater than Y. Any element not matching this

相关标签:
9条回答
  • 2020-12-24 07:35

    Using numpy:

    # assign zero to all elements less than or equal to `lowValY`
    a[a<=lowValY] = 0 
    # find n-th largest element in the array (where n=highCountX)
    x = partial_sort(a, highCountX, reverse=True)[:highCountX][-1]
    # 
    a[a<x] = 0 #NOTE: it might leave more than highCountX non-zero elements
               # . if there are duplicates
    

    Where partial_sort could be:

    def partial_sort(a, n, reverse=False):
        #NOTE: in general it should return full list but in your case this will do
        return sorted(a, reverse=reverse)[:n] 
    

    The expression a[a<value] = 0 can be written without numpy as follows:

    for i, x in enumerate(a):
        if x < value:
           a[i] = 0
    
    0 讨论(0)
  • 2020-12-24 07:42

    You can use map and lambda, it should be fast enough.

    new_array = map(lambda x: x if x>y else 0, array)
    
    0 讨论(0)
  • 2020-12-24 07:43

    The simplest way would be:

    topX = sorted([x for x in array if x > lowValY], reverse=True)[highCountX-1]
    print [x if x >= topX else 0 for x in array]
    

    In pieces, this selects all the elements greater than lowValY:

    [x for x in array if x > lowValY]
    

    This array only contains the number of elements greater than the threshold. Then, sorting it so the largest values are at the start:

    sorted(..., reverse=True)
    

    Then a list index takes the threshold for the top highCountX elements:

    sorted(...)[highCountX-1]
    

    Finally, the original array is filled out using another list comprehension:

    [x if x >= topX else 0 for x in array]
    

    There is a boundary condition where there are two or more equal elements that (in your example) are 3rd highest elements. The resulting array will contain that element more than once.

    There are other boundary conditions as well, such as if len(array) < highCountX. Handling such conditions is left to the implementor.

    0 讨论(0)
提交回复
热议问题