how do I get a row-wise comparison between two arrays, in the result of a row-wise true/false array?
Given datas:
a = np.array([[1,0],[2,0],[3,1],[
You can do it as a list comp via:
c = np.array([row in b for row in a])
though this approach will be slower than a pure numpy approach (if it exists).
Here's a vectorised solution:
res = (a[:, None] == b).all(-1).any(-1)
print(res)
array([ True, True, False, True])
Note that a[:, None] == b
compares each row of a
with b
element-wise. We then use all
+ any
to deduce if there are any rows which are all True
for each sub-array:
print(a[:, None] == b)
[[[ True True]
[False True]
[False False]]
[[False True]
[ True True]
[False False]]
[[False False]
[False False]
[False False]]
[[False False]
[False False]
[ True True]]]
You can use scipy's cdist which has a few advantages:
from scipy.spatial.distance import cdist
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = cdist(a, b)==0
print(c.any(axis=1))
[ True True False True]
print(a[c.any(axis=1)])
[[1 0]
[2 0]
[4 2]]
Also, cdist allows passing of a function pointer. So you can specify your own distance functions, to do whatever comparison you need:
c = cdist(a, b, lambda u, v: (u==v).all())
print(c)
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]]
And now you can find which index matches. Which will also indicate if there are multiple matches.
# Array with multiple instances
a2 = np.array([[1,0],[2,0],[3,1],[4,2],[3,1],[4,2]])
c2 = cdist(a2, b, lambda u, v: (u==v).all())
print(c2)
idx = np.where(c2==1)
print(idx)
print(idx[0][idx[1]==2])
[[1. 0. 0.]
[0. 1. 0.]
[0. 0. 0.]
[0. 0. 1.]
[0. 0. 0.]
[0. 0. 1.]]
(array([0, 1, 3, 5], dtype=int64), array([0, 1, 2, 2], dtype=int64))
[3 5]
you can use numpy with apply_along_axis (kind of iteration on specific axis while axis=0 iterate on every cell, axis=1 iterate on every row, axis=2 on matrix and so on
import numpy as np
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
c = np.apply_along_axis(lambda x,y: x in y, 1, a, b)
Approach #1
We could use a view
based vectorized solution -
# https://stackoverflow.com/a/45313353/ @Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
A,B = view1D(a,b)
out = np.isin(A,B)
Sample run -
In [8]: a
Out[8]:
array([[1, 0],
[2, 0],
[3, 1],
[4, 2]])
In [9]: b
Out[9]:
array([[1, 0],
[2, 0],
[4, 2]])
In [10]: A,B = view1D(a,b)
In [11]: np.isin(A,B)
Out[11]: array([ True, True, False, True])
Approach #2
Alternatively for the case when all rows in b
are in a
and rows are lexicographically sorted, using the same views
, but with searchsorted
-
out = np.zeros(len(A), dtype=bool)
out[np.searchsorted(A,B)] = 1
If the rows are not necessarily lexicographically sorted -
sidx = A.argsort()
out[sidx[np.searchsorted(A,B,sorter=sidx)]] = 1
a = np.array([[1,0],[2,0],[3,1],[4,2]])
b = np.array([[1,0],[2,0],[4,2]])
i = 0
j = 0
result = []
We can take advantage of the fact that they are sorted and do this in O(n) time. Using two pointers we just move ahead the pointer that has gotten behind:
while i < len(a) and j < len(b):
if tuple(a[i])== tuple(b[j]):
result.append(True)
i += 1
j += 1 # get rid of this depending on how you want to handle duplicates
elif tuple(a[i]) > tuple(b[j]):
j += 1
else:
result.append(False)
i += 1
Pad with False if it ends early.
if len(result) < len(a):
result.extend([False] * (len(a) - len(result)))
print(result) # [True, True, False, True]
This answer is adapted from Better way to find matches in two sorted lists than using for loops? (Java)