How to group “remaining” results beyond Top N into “Others” with pandas

后端未结

关注

 2  1361

孤独总比滥情好 2021-02-11 01:05

When group a pandas dataframe by one column say \"version\" and which has 10 distinct versions. How can one plot the Top 3 (which cover over 90%) and put the small remainders in

2条回答

無奈伤痛 (楼主)

2021-02-11 01:42
I assume you also want the Other group to be summed, for your example to a total of 3?

If i was aiming to win the Pandas one-liner competition this would be my entry:
```
df.replace(df.groupby('Version').sum().sort('Value', ascending=False).index[2:], 'Other').groupby('Version').sum()

         Value
Version       
Other        3
Top1        19
Top2        13
```
But that's completely unreadable, so lets break it down:

You already showed how to sum each group, sorting this result and selecting anything outside of the top 2 can be done with:
```
not_top2 = df.groupby('Version').sum().sort('Value', ascending=False).index[2:]
```
In this example not_top2 contains Other1 and Other2.

We can replace those Versions to a common name with:
```
dfnew  = df.replace(not_top2, 'Other')
print dfnew

  Version  Value
0    Top1     14
1    Top1      3
2    Top1      2
3    Top2      6
4    Top2      7
5   Other      1
6   Other      2
```
The above replaces the contents of not_top2 in any column. A little substep is needed if you expect this value to occur in any other column than Version.

Whats left is to do your original grouping again:
```
dfnew.groupby('Version').sum()
```
Which gives:
```
         Value
Version       
Other        3
Top1        19
Top2        13
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...