Vectorized way to count occurrences of string in either of two columns

后端 未结 4 677
一整个雨季
一整个雨季 2021-01-05 03:57

I have a problem that is similar to this question, but just different enough that it can\'t be solved with the same solution...

I\'ve got two dataframes,

4条回答
  •  有刺的猬
    2021-01-05 04:25

    Here's a solution where you effectively do the nested "in" loop by expanding dimensionality of ID from df2 to take advantage of NumPy broadcasting:

    >>> def count_names(df1, df2):
    ...     names1, names2 = df1.values.T
    ...     v2 = df2.ID.values[:, None]
    ...     mask1 = v2 == names1
    ...     mask2 = v2 == names2
    ...     df2['count'] = np.logical_or(mask1, mask2).sum(axis=1)
    ...     return df2
    
    
    >>> %timeit -r 5 -n 1000 count_names(df1, df2)
    144 µs ± 10.4 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
    
    >>> %timeit -r 5 -n 1000 jp(df1, df2)
    224 µs ± 15.5 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
    
    >>> %timeit -r 5 -n 1000 cs(df1, df2)
    238 µs ± 2.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    
    >>> %timeit -r 5 -n 1000 wen(df1, df2)
    921 µs ± 15.3 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
    

    The shape of the masks will be (len(df1), len(df2)).

提交回复
热议问题