Delimiting contiguous regions with values above a certain threshold in Pandas DataFrame

后端 未结 2 1513
迷失自我
迷失自我 2021-01-04 23:29

I have a Pandas Dataframe of indices and values between 0 and 1, something like this:

 6  0.047033
 7  0.047650
 8  0.054067
 9  0.064767
10  0.073183
11  0.         


        
2条回答
  •  太阳男子
    2021-01-05 00:04

    You can find the first and last element of each consecutive region by looking at the series and 1-row shifted values, and then filter the pairs which are adequately apart from each other:

    # tag rows based on the threshold
    df['tag'] = df['values'] > .5
    
    # first row is a True preceded by a False
    fst = df.index[df['tag'] & ~ df['tag'].shift(1).fillna(False)]
    
    # last row is a True followed by a False
    lst = df.index[df['tag'] & ~ df['tag'].shift(-1).fillna(False)]
    
    # filter those which are adequately apart
    pr = [(i, j) for i, j in zip(fst, lst) if j > i + 4]
    

    so for example the first region would be:

    >>> i, j = pr[0]
    >>> df.loc[i:j]
        indices    values   tag
    15       16  0.639992  True
    16       17  0.593427  True
    17       18  0.810888  True
    18       19  0.596243  True
    19       20  0.812684  True
    20       21  0.617945  True
    

提交回复
热议问题