I\'ve been looking for a way to efficiently check for duplicates in a numpy array and stumbled upon a question that contained an answer using this code.
What does th
The slices [1:]
and [:-1]
mean all but the first and all but the last elements of the array:
>>> import numpy as np
>>> s = np.array((1, 2, 2, 3)) # four element array
>>> s[1:]
array([2, 2, 3]) # last three elements
>>> s[:-1]
array([1, 2, 2]) # first three elements
therefore the comparison generates an array of boolean comparisons between each element s[x]
and its "neighbour" s[x+1]
, which will be one shorter than the original array (as the last element has no neighbour):
>>> s[1:] == s[:-1]
array([False, True, False], dtype=bool)
and using that array to index the original array gets you the elements where the comparison is True
, i.e. the elements that are the same as their neighbour:
>>> s[s[1:] == s[:-1]]
array([2])
Note that this only identifies adjacent duplicate values.
Check this out:
>>> s=numpy.array([1,3,5,6,7,7,8,9])
>>> s[1:] == s[:-1]
array([False, False, False, False, True, False, False], dtype=bool)
>>> s[s[1:] == s[:-1]]
array([7])
So s[1:]
gives all numbers but the first, and s[:-1]
all but the last.
Now compare these two vectors, e.g. look if two adjacent elements are the same. Last, select these elements.
It will show duplicates in a sorted array.
Basically, the inner expression s[1:] == s[:-1]
compares the array with its shifted version. Imagine this:
1, [2, 3, ... n-1, n ]
- [1, 2, ... n-2, n-1] n
=> [F, F, ... F, F ]
In a sorted array, there will be no True
in resulted array unless you had repetition. Then, this expression s[array]
filters those which has True
in the index array
.
s[1:] == s[:-1]
compares s
without the first element with s
without the last element, i.e. 0th with 1st, 1st with 2nd etc, giving you an array of len(s) - 1
boolean elements. s[boolarray]
will select only those elements from s
which have True
at the corresponding place in boolarray
. Thus, the code extracts all elements that are equal to the next element.