Pandas calculate length of consecutive equal values from a grouped dataframe

寵の児 提交于 2021-02-07 20:34:55

问题


I want to do what they've done in the answer here: Calculating the number of specific consecutive equal values in a vectorized way in pandas , but using a grouped dataframe instead of a series.

So given a dataframe with several columns

A    B    C   
------------ 
x    x    0
x    x    5
x    x    2
x    x    0
x    x    0
x    x    3
x    x    0
y    x    1
y    x    10
y    x    0
y    x    5
y    x    0
y    x    0

I want to groupby columns A and B, then count the number of consecutive zeros in C. After that I'd like to return counts of the number of times each length of zeros occurred. So I want output like this:

A    B    num_consecutive_zeros  count
---------------------------------------
x    x            1                2
x    x            2                1
y    x            1                1
y    x            2                1

I don't know how to adapt the answer from the linked question to deal with grouped dataframes.


回答1:


Here is the code, count_consecutive_zeros() use numpy functions and pandas.value_counts() to get the results, and use groupby().apply(count_consecutive_zeros) to call count_consecutive_zeros() for every group. call reset_index() to change MultiIndex to columns:

import pandas as pd
import numpy as np
from io import BytesIO
text = """A    B    C   
x    x    0
x    x    5
x    x    2
x    x    0
x    x    0
x    x    3
x    x    0
y    x    1
y    x    10
y    x    0
y    x    5
y    x    0
y    x    0"""

df = pd.read_csv(BytesIO(text.encode()), delim_whitespace=True)

def count_consecutive_zeros(s):
    v = np.diff(np.r_[0, s.values==0, 0])
    s = pd.value_counts(np.where(v == -1)[0] - np.where(v == 1)[0])
    s.index.name = "num_consecutive_zeros"
    s.name = "count"
    return s

df.groupby(["A", "B"]).C.apply(count_consecutive_zeros).reset_index()


来源:https://stackoverflow.com/questions/29640588/pandas-calculate-length-of-consecutive-equal-values-from-a-grouped-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!