问题
What is the fastest method to delete elements from numpy array while retreiving their initial positions. The following code does not return all elements that it should:
list = []
for pos,i in enumerate(ARRAY):
if i < some_condition:
list.append(pos) #This is where the loop fails
for _ in list:
ARRAY = np.delete(ARRAY, _)
回答1:
It really feels like you're going about this inefficiently. You should probably be using more builtin numpy capabilities -- e.g. np.where
, or boolean indexing. Using np.delete
in a loop like that is going to kill any performance gains you get from using numpy...
For example (with boolean indexing):
keep = np.ones(ARRAY.shape, dtype=bool)
for pos, val in enumerate(ARRAY):
if val < some_condition:
keep[pos] = False
ARRAY = ARRAY[keep]
Of course, this could possibly be simplified (and generalized) even further:
ARRAY = ARRAY[ARRAY >= some_condition]
EDIT
You've stated in the comments that you need the same mask to operate on other arrays as well -- That's not a problem. You can keep a handle on the mask and use it for other arrays:
mask = ARRAY >= some_condition
ARRAY = ARRAY[mask]
OTHER_ARRAY = OTHER_ARRAY[mask]
...
Additionally (and perhaps this is the reason your original code isn't working), as soon as you delete the first index from the array in your loop, all of the other items shift one index to the left, so you're not actually deleting the same items that you "tagged" on the initial pass.
As an example, lets say that your original array was [a, b, c, d, e]
and on the original pass, you tagged elements at indexes [0, 2]
for deletion (a
, c
)... On the first pass through your delete loop, you'd remove the item at index 0 -- Which would make your array:
[b, c, d, e]
now on the second iteration of your delete loop, you're going to delete the item at index 2 in the new array:
[b, c, e]
But look, instead of removing c
like we wanted, we actually removed d
! Oh snap!
To fix that, you could probably write your loop over reversed(list)
, but that still won't result in a fast operation.
回答2:
You don't need to iterate, especially with a simple condition like this. And you don't really need to use delete
:
A sample array:
In [693]: x=np.arange(10)
A mask, boolean array were a condition is true (or false):
In [694]: msk = x%2==0
In [695]: msk
Out[695]: array([ True, False, True, False, True, False, True, False, True, False], dtype=bool)
where
(or nonzero
) converts it to indexes
In [696]: ind=np.where(msk)
In [697]: ind
Out[697]: (array([0, 2, 4, 6, 8], dtype=int32),)
You use the whole ind
in one call to delete
(no need to iterate):
In [698]: np.delete(x,ind)
Out[698]: array([1, 3, 5, 7, 9])
You can use it ind
to retain those values instead:
In [699]: x[ind]
Out[699]: array([0, 2, 4, 6, 8])
Or you can used the boolean msk
directly:
In [700]: x[msk]
Out[700]: array([0, 2, 4, 6, 8])
or use its inverse:
In [701]: x[~msk]
Out[701]: array([1, 3, 5, 7, 9])
delete
doesn't do much more than this kind of boolean masking. It's all Python code, so you can easily study it.
来源:https://stackoverflow.com/questions/34914905/deleting-elements-from-numpy-array-with-iteration