Vectorized groupby with NumPy

前端未结

关注

 4  2135

自闭症患者 2020-12-31 08:11

Pandas has a widely-used groupby facility to split up a DataFrame based on a corresponding mapping, from which you can apply a calculation on each subgroup and recombine the

4条回答

时光说笑 (楼主)

2020-12-31 08:52

If you want a more flexible implementation of groupby that can group using any of numpy's ufuncs:

def groupby_np(X, groups, axis = 0, uf = np.add, out = None, minlength = 0, identity = None):
    if minlength < groups.max() + 1:
        minlength = groups.max() + 1
    if identity is None:
        identity = uf.identity
    i = list(range(X.ndim))
    del i[axis]
    i = tuple(i)
    n = out is None
    if n:
        if identity is None:  # fallback to loops over 0-index for identity
            assert np.all(np.in1d(np.arange(minlength), groups)), "No valid identity for unassinged groups"
            s = [slice(None)] * X.ndim
            for i_ in i:
                s[i_] = 0
            out = np.array([uf.reduce(X[tuple(s)][groups == i]) for i in range(minlength)])
        else:
            out = np.full((minlength,), identity, dtype = X.dtype)
    uf.at(out, groups, uf.reduce(X, i))
    if n:
        return out

groupby_np(X, groups)
array([15, 30])

groupby_np(X, groups, uf = np.multiply)
array([   0, 3024])

groupby_np(X, groups, uf = np.maximum)
array([5, 9])

groupby_np(X, groups, uf = np.minimum)
array([0, 6])

0 讨论(0)

查看其它4个回答