I know there is a method for a Python list to return the first index of something:
>>> l = [1, 2, 3]
>>> l.index(2)
1
Is
Just to add a very performant and handy numba alternative based on np.ndenumerate to find the first index:
from numba import njit
import numpy as np
@njit
def index(array, item):
for idx, val in np.ndenumerate(array):
if val == item:
return idx
# If no item was found return None, other return types might be a problem due to
# numbas type inference.
This is pretty fast and deals naturally with multidimensional arrays:
>>> arr1 = np.ones((100, 100, 100))
>>> arr1[2, 2, 2] = 2
>>> index(arr1, 2)
(2, 2, 2)
>>> arr2 = np.ones(20)
>>> arr2[5] = 2
>>> index(arr2, 2)
(5,)
This can be much faster (because it's short-circuiting the operation) than any approach using np.where
or np.nonzero
.
However np.argwhere could also deal gracefully with multidimensional arrays (you would need to manually cast it to a tuple and it's not short-circuited) but it would fail if no match is found:
>>> tuple(np.argwhere(arr1 == 2)[0])
(2, 2, 2)
>>> tuple(np.argwhere(arr2 == 2)[0])
(5,)
Yes, given an array, array
, and a value, item
to search for, you can use np.where as:
itemindex = numpy.where(array==item)
The result is a tuple with first all the row indices, then all the column indices.
For example, if an array is two dimensions and it contained your item at two locations then
array[itemindex[0][0]][itemindex[1][0]]
would be equal to your item and so would be:
array[itemindex[0][1]][itemindex[1][1]]
An alternative to selecting the first element from np.where() is to use a generator expression together with enumerate, such as:
>>> import numpy as np
>>> x = np.arange(100) # x = array([0, 1, 2, 3, ... 99])
>>> next(i for i, x_i in enumerate(x) if x_i == 2)
2
For a two dimensional array one would do:
>>> x = np.arange(100).reshape(10,10) # x = array([[0, 1, 2,... 9], [10,..19],])
>>> next((i,j) for i, x_i in enumerate(x)
... for j, x_ij in enumerate(x_i) if x_ij == 2)
(0, 2)
The advantage of this approach is that it stops checking the elements of the array after the first match is found, whereas np.where checks all elements for a match. A generator expression would be faster if there's match early in the array.
For one-dimensional sorted arrays, it would be much more simpler and efficient O(log(n)) to use numpy.searchsorted which returns a NumPy integer (position). For example,
arr = np.array([1, 1, 1, 2, 3, 3, 4])
i = np.searchsorted(arr, 3)
Just make sure the array is already sorted
Also check if returned index i actually contains the searched element, since searchsorted's main objective is to find indices where elements should be inserted to maintain order.
if arr[i] == 3:
print("present")
else:
print("not present")
If you need the index of the first occurrence of only one value, you can use nonzero
(or where
, which amounts to the same thing in this case):
>>> t = array([1, 1, 1, 2, 2, 3, 8, 3, 8, 8])
>>> nonzero(t == 8)
(array([6, 8, 9]),)
>>> nonzero(t == 8)[0][0]
6
If you need the first index of each of many values, you could obviously do the same as above repeatedly, but there is a trick that may be faster. The following finds the indices of the first element of each subsequence:
>>> nonzero(r_[1, diff(t)[:-1]])
(array([0, 3, 5, 6, 7, 8]),)
Notice that it finds the beginning of both subsequence of 3s and both subsequences of 8s:
[1, 1, 1, 2, 2, 3, 8, 3, 8, 8]
So it's slightly different than finding the first occurrence of each value. In your program, you may be able to work with a sorted version of t
to get what you want:
>>> st = sorted(t)
>>> nonzero(r_[1, diff(st)[:-1]])
(array([0, 3, 5, 7]),)
For 1D arrays, I'd recommend np.flatnonzero(array == value)[0]
, which is equivalent to both np.nonzero(array == value)[0][0]
and np.where(array == value)[0][0]
but avoids the ugliness of unboxing a 1-element tuple.