Vectorized groupby with NumPy

前端 未结 4 2135
自闭症患者
自闭症患者 2020-12-31 08:11

Pandas has a widely-used groupby facility to split up a DataFrame based on a corresponding mapping, from which you can apply a calculation on each subgroup and recombine the

4条回答
  •  时光说笑
    2020-12-31 08:52

    If you want a more flexible implementation of groupby that can group using any of numpy's ufuncs:

    def groupby_np(X, groups, axis = 0, uf = np.add, out = None, minlength = 0, identity = None):
        if minlength < groups.max() + 1:
            minlength = groups.max() + 1
        if identity is None:
            identity = uf.identity
        i = list(range(X.ndim))
        del i[axis]
        i = tuple(i)
        n = out is None
        if n:
            if identity is None:  # fallback to loops over 0-index for identity
                assert np.all(np.in1d(np.arange(minlength), groups)), "No valid identity for unassinged groups"
                s = [slice(None)] * X.ndim
                for i_ in i:
                    s[i_] = 0
                out = np.array([uf.reduce(X[tuple(s)][groups == i]) for i in range(minlength)])
            else:
                out = np.full((minlength,), identity, dtype = X.dtype)
        uf.at(out, groups, uf.reduce(X, i))
        if n:
            return out
    
    groupby_np(X, groups)
    array([15, 30])
    
    groupby_np(X, groups, uf = np.multiply)
    array([   0, 3024])
    
    groupby_np(X, groups, uf = np.maximum)
    array([5, 9])
    
    groupby_np(X, groups, uf = np.minimum)
    array([0, 6])
    

提交回复
热议问题