Removing the first rows before a specific value - pandas

后端 未结 2 411
心在旅途
心在旅途 2021-01-21 05:08

I am trying to remove all rows before an initial value for a group. For instance, if my max_value = 250, then all rows for a group before that value should be remov

相关标签:
2条回答
  • 2021-01-21 05:15

    This should do the trick:

    df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value).cumsum().ne(0))]
    

    Yields:

              date    Asset  Monthly Value
    2   2019-03-01  Asset A            300
    3   2019-04-01  Asset A            400
    4   2019-01-01  Asset A            500
    5   2019-02-01  Asset A            600
    8   2019-01-01  Asset B            300
    9   2019-02-01  Asset B            200
    10  2019-03-01  Asset B            300
    11  2019-04-01  Asset B            200
    

    Additionally, if you store your max values in a dictionary like max_value = {'Asset A': 250, 'Asset B': 250}, you can do the following to achieve the same result:

    df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value[x.name]).cumsum().ne(0))]
    
    0 讨论(0)
  • 2021-01-21 05:37

    You don't need apply. Groupby on boolean series to create the mask to slice the desired output. As your new requirement that each group slice on different max_value. You need to create a dictionary using unique values of Asset and max_value_list and map it to Asset column to create a series s of max_values. Finally, compare Monthly Value against s and groupby cumsum to create mask m for slicing. (Note: I change your sample to different values to show slicing on different max_values)

    Modified sample `df` to show slicing on different max_value
    
    Out[334]:
              date    Asset  Monthly Value
    0   2019-01-01  Asset A            100
    1   2019-02-01  Asset A            200
    2   2019-03-01  Asset A            300
    3   2019-04-01  Asset A            400
    4   2019-01-01  Asset A            500
    5   2019-02-01  Asset A            600
    6   2019-03-01  Asset B            100
    7   2019-04-01  Asset B            350
    8   2019-01-01  Asset B            450
    9   2019-02-01  Asset B            200
    10  2019-03-01  Asset B            300
    11  2019-04-01  Asset B            200
    
    max_value_list = [250, 300]
    max_dict = dict(zip(df.Asset.unique(), max_value_list))
    s = df.Asset.map(max_dict)
    m = (df['Monthly Value'] > s).groupby(df.Asset).cumsum().ne(0)
    df[m]
    
    Out[333]:
              date    Asset  Monthly Value
    2   2019-03-01  Asset A            300
    3   2019-04-01  Asset A            400
    4   2019-01-01  Asset A            500
    5   2019-02-01  Asset A            600
    7   2019-04-01  Asset B            350
    8   2019-01-01  Asset B            450
    9   2019-02-01  Asset B            200
    10  2019-03-01  Asset B            300
    11  2019-04-01  Asset B            200
    
    0 讨论(0)
提交回复
热议问题