How can I count the number of consecutive TRUEs in a DataFrame?

后端 未结 2 1172
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-11 14:58

I have a dataset made of True and False.

Sample Table:
       A      B      C
0  False   True  False
1  False  False  False
2   True   True  False
3   True           


        
2条回答
  •  不知归路
    2021-01-11 15:56

    Solution should be simplify, if always at least one True per column:

    b = df.cumsum()
    c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
    
    print (c)
       A  B  C
    0  0  1  0
    1  0  0  0
    2  1  1  0
    3  2  2  1
    4  0  3  0
    5  1  4  1
    6  2  0  0
    7  3  0  1
    8  0  1  2
    9  1  0  0
    
    #get maximal value of all columns
    length = c.max().tolist()
    print (length)
    [3, 4, 2]
    
    #get indexes by maximal value, subtract length and add 1 
    index = c.idxmax().sub(length).add(1).tolist()
    print (index)
    [5, 2, 7]
    

    Detail:

    print (pd.concat([b,
                      b.mask(df), 
                      b.mask(df).ffill(), 
                      b.mask(df).ffill().fillna(0),
                      b.sub(b.mask(df).ffill().fillna(0)).astype(int)
                      ], axis=1, 
                      keys=('cumsum', 'mask', 'ffill', 'fillna','sub')))
    
      cumsum       mask           ffill           fillna           sub      
           A  B  C    A    B    C     A    B    C      A    B    C   A  B  C
    0      0  1  0  0.0  NaN  0.0   0.0  NaN  0.0    0.0  0.0  0.0   0  1  0
    1      0  1  0  0.0  1.0  0.0   0.0  1.0  0.0    0.0  1.0  0.0   0  0  0
    2      1  2  0  NaN  NaN  0.0   0.0  1.0  0.0    0.0  1.0  0.0   1  1  0
    3      2  3  1  NaN  NaN  NaN   0.0  1.0  0.0    0.0  1.0  0.0   2  2  1
    4      2  4  1  2.0  NaN  1.0   2.0  1.0  1.0    2.0  1.0  1.0   0  3  0
    5      3  5  2  NaN  NaN  NaN   2.0  1.0  1.0    2.0  1.0  1.0   1  4  1
    6      4  5  2  NaN  5.0  2.0   2.0  5.0  2.0    2.0  5.0  2.0   2  0  0
    7      5  5  3  NaN  5.0  NaN   2.0  5.0  2.0    2.0  5.0  2.0   3  0  1
    8      5  6  4  5.0  NaN  NaN   5.0  5.0  2.0    5.0  5.0  2.0   0  1  2
    9      6  6  4  NaN  6.0  4.0   5.0  6.0  4.0    5.0  6.0  4.0   1  0  0
    

    EDIT:

    General solution working with only False columns - add numpy.where with boolean mask created by DataFrame.any:

    print (df)
           A      B      C
    0  False   True  False
    1  False  False  False
    2   True   True  False
    3   True   True  False
    4  False   True  False
    5   True   True  False
    6   True  False  False
    7   True  False  False
    8  False   True  False
    9   True  False  False
    
    b = df.cumsum()
    c = b.sub(b.mask(df).ffill().fillna(0)).astype(int)
    
    mask = df.any()
    length = np.where(mask, c.max(), -1).tolist()
    print (length)
    [3, 4, -1]
    
    index =  np.where(mask, c.idxmax().sub(c.max()).add(1), 0).tolist()
    print (index)
    [5, 2, 0]
    

提交回复
热议问题