Fastest way to zero out low values in array?

后端未结

关注

 9  1496

So, lets say I have 100,000 float arrays with 100 elements each. I need the highest X number of values, BUT only if they are greater than Y. Any element not matching this

相关标签:

9条回答

小蘑菇

2020-12-24 07:16

Use a heap.

This works in time O(n*lg(HighCountX)).

import heapq

heap = []
array =  [.06, .25, 0, .15, .5, 0, 0, 0.04, 0, 0]
highCountX = 3
lowValY = .1

for i in range(1,highCountX):
    heappush(heap, lowValY)
    heappop(heap)

for i in range( 0, len(array) - 1)
    if array[i] > heap[0]:
        heappush(heap, array[i])

min = heap[0]

array = [x if x >= min else 0 for x in array]

deletemin works in heap O(lg(k)) and insertion O(lg(k)) or O(1) depending on which heap type you use.

0 讨论(0)

你的背包

2020-12-24 07:22
```
from scipy.stats import threshold
thresholded = threshold(array, 0.5)
```
:)
0 讨论(0)
发布评论:

提交评论
- 加载中...

长发绾君心

2020-12-24 07:32

Using a heap is a good idea, as egon says. But you can use the heapq.nlargest function to cut down on some effort:

import heapq 

array =  [.06, .25, 0, .15, .5, 0, 0, 0.04, 0, 0]
highCountX = 3
lowValY = .1

threshold = max(heapq.nlargest(highCountX, array)[-1], lowValY)
array = [x if x >= threshold else 0 for x in array]

0 讨论(0)

醉酒成梦

2020-12-24 07:33
This is a typical job for NumPy, which is very fast for these kinds of operations:
```
array_np = numpy.asarray(array)
low_values_flags = array_np < lowValY  # Where values are low
array_np[low_values_flags] = 0  # All low values set to 0
```
Now, if you only need the highCountX largest elements, you can even "forget" the small elements (instead of setting them to 0 and sorting them) and only sort the list of large elements:
```
array_np = numpy.asarray(array)
print numpy.sort(array_np[array_np >= lowValY])[-highCountX:]
```
Of course, sorting the whole array if you only need a few elements might not be optimal. Depending on your needs, you might want to consider the standard heapq module.
0 讨论(0)
发布评论:

提交评论
- 加载中...
心在旅途

2020-12-24 07:33
Settings elements below some threshold to zero is easy:
```
array = [ x if x > threshold else 0.0 for x in array ]
```
(plus the occasional abs() if needed.)

The requirement of the N highest numbers is a bit vague, however. What if there are e.g. N+1 equal numbers above the threshold? Which one to truncate?

You could sort the array first, then set the threshold to the value of the Nth element:
```
threshold = sorted(array, reverse=True)[N]
array = [ x if x >= threshold else 0.0 for x in array ]
```
Note: this solution is optimized for readability not performance.
0 讨论(0)
发布评论:

提交评论
- 加载中...
爱一瞬间的悲伤

2020-12-24 07:34
There's a special MaskedArray class in NumPy that does exactly that. You can "mask" elements based on any precondition. This better represent your need than assigning zeroes: numpy operations will ignore masked values when appropriate (for example, finding mean value).
```
>>> from numpy import ma
>>> x = ma.array([.06, .25, 0, .15, .5, 0, 0, 0.04, 0, 0])
>>> x1 = ma.masked_inside(0, 0.1) # mask everything in 0..0.1 range
>>> x1
masked_array(data = [-- 0.25 -- 0.15 0.5 -- -- -- -- --],
         mask = [ True False True False False True True True True True],
   fill_value = 1e+20)
>>> print x.filled(0) # Fill with zeroes
[ 0 0.25 0 0.15 0.5 0 0 0 0 0 ]
```
As an affffded benefit, masked arrays are well supported in matplotlib visualisation library if you need this.

Docs on masked arrays in numpy
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页