Deleting elements from numpy array with iteration

爷,独闯天下 提交于 2020-02-25 13:26:31

问题


What is the fastest method to delete elements from numpy array while retreiving their initial positions. The following code does not return all elements that it should:

list = []
for pos,i in enumerate(ARRAY):
    if i < some_condition:
        list.append(pos)  #This is where the loop fails

for _ in list:
    ARRAY = np.delete(ARRAY, _)

回答1:


It really feels like you're going about this inefficiently. You should probably be using more builtin numpy capabilities -- e.g. np.where, or boolean indexing. Using np.delete in a loop like that is going to kill any performance gains you get from using numpy...

For example (with boolean indexing):

keep = np.ones(ARRAY.shape, dtype=bool)
for pos, val in enumerate(ARRAY):
    if val < some_condition:
        keep[pos] = False
ARRAY = ARRAY[keep]

Of course, this could possibly be simplified (and generalized) even further:

ARRAY = ARRAY[ARRAY >= some_condition]

EDIT

You've stated in the comments that you need the same mask to operate on other arrays as well -- That's not a problem. You can keep a handle on the mask and use it for other arrays:

mask = ARRAY >= some_condition
ARRAY = ARRAY[mask]
OTHER_ARRAY = OTHER_ARRAY[mask]
...

Additionally (and perhaps this is the reason your original code isn't working), as soon as you delete the first index from the array in your loop, all of the other items shift one index to the left, so you're not actually deleting the same items that you "tagged" on the initial pass.

As an example, lets say that your original array was [a, b, c, d, e] and on the original pass, you tagged elements at indexes [0, 2] for deletion (a, c)... On the first pass through your delete loop, you'd remove the item at index 0 -- Which would make your array:

[b, c, d, e]

now on the second iteration of your delete loop, you're going to delete the item at index 2 in the new array:

[b, c, e]

But look, instead of removing c like we wanted, we actually removed d! Oh snap!

To fix that, you could probably write your loop over reversed(list), but that still won't result in a fast operation.




回答2:


You don't need to iterate, especially with a simple condition like this. And you don't really need to use delete:

A sample array:

In [693]: x=np.arange(10)

A mask, boolean array were a condition is true (or false):

In [694]: msk = x%2==0
In [695]: msk
Out[695]: array([ True, False,  True, False,  True, False,  True, False,  True, False], dtype=bool)

where (or nonzero) converts it to indexes

In [696]: ind=np.where(msk)
In [697]: ind
Out[697]: (array([0, 2, 4, 6, 8], dtype=int32),)

You use the whole ind in one call to delete (no need to iterate):

In [698]: np.delete(x,ind)
Out[698]: array([1, 3, 5, 7, 9])

You can use it ind to retain those values instead:

In [699]: x[ind]
Out[699]: array([0, 2, 4, 6, 8])

Or you can used the boolean msk directly:

In [700]: x[msk]
Out[700]: array([0, 2, 4, 6, 8])

or use its inverse:

In [701]: x[~msk]
Out[701]: array([1, 3, 5, 7, 9])

delete doesn't do much more than this kind of boolean masking. It's all Python code, so you can easily study it.



来源:https://stackoverflow.com/questions/34914905/deleting-elements-from-numpy-array-with-iteration

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!