Removing the first rows before a specific value - pandas

后端未结

关注

 2  411

I am trying to remove all rows before an initial value for a group. For instance, if my max_value = 250, then all rows for a group before that value should be remov

相关标签:

2条回答

孤独总比滥情好

2021-01-21 05:15

This should do the trick:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value).cumsum().ne(0))]

Yields:

          date    Asset  Monthly Value
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
8   2019-01-01  Asset B            300
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

Additionally, if you store your max values in a dictionary like max_value = {'Asset A': 250, 'Asset B': 250}, you can do the following to achieve the same result:

df[df.groupby('Asset')['Monthly Value'].apply(lambda x: x.gt(max_value[x.name]).cumsum().ne(0))]

0 讨论(0)

时光说笑

2021-01-21 05:37

You don't need apply. Groupby on boolean series to create the mask to slice the desired output. As your new requirement that each group slice on different max_value. You need to create a dictionary using unique values of Asset and max_value_list and map it to Asset column to create a series s of max_values. Finally, compare Monthly Value against s and groupby cumsum to create mask m for slicing. (Note: I change your sample to different values to show slicing on different max_values)

Modified sample `df` to show slicing on different max_value

Out[334]:
          date    Asset  Monthly Value
0   2019-01-01  Asset A            100
1   2019-02-01  Asset A            200
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
6   2019-03-01  Asset B            100
7   2019-04-01  Asset B            350
8   2019-01-01  Asset B            450
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

max_value_list = [250, 300]
max_dict = dict(zip(df.Asset.unique(), max_value_list))
s = df.Asset.map(max_dict)
m = (df['Monthly Value'] > s).groupby(df.Asset).cumsum().ne(0)
df[m]

Out[333]:
          date    Asset  Monthly Value
2   2019-03-01  Asset A            300
3   2019-04-01  Asset A            400
4   2019-01-01  Asset A            500
5   2019-02-01  Asset A            600
7   2019-04-01  Asset B            350
8   2019-01-01  Asset B            450
9   2019-02-01  Asset B            200
10  2019-03-01  Asset B            300
11  2019-04-01  Asset B            200

0 讨论(0)