I have a problem that is similar to this question, but just different enough that it can\'t be solved with the same solution...
I\'ve got two dataframes,
Here's a solution where you effectively do the nested "in" loop by expanding dimensionality of ID
from df2
to take advantage of NumPy broadcasting:
>>> def count_names(df1, df2):
... names1, names2 = df1.values.T
... v2 = df2.ID.values[:, None]
... mask1 = v2 == names1
... mask2 = v2 == names2
... df2['count'] = np.logical_or(mask1, mask2).sum(axis=1)
... return df2
>>> %timeit -r 5 -n 1000 count_names(df1, df2)
144 µs ± 10.4 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r 5 -n 1000 jp(df1, df2)
224 µs ± 15.5 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r 5 -n 1000 cs(df1, df2)
238 µs ± 2.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit -r 5 -n 1000 wen(df1, df2)
921 µs ± 15.3 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
The shape of the masks will be (len(df1), len(df2))
.