发表新帖

发表新帖

Group by index + column in pandas

后端未结

关注

 4  436

北荒 2021-02-02 05:27

I have a dataframe that has the columns

user_id
item_bought

Here user_id is the index of the df. I want to group by both user_id and item_b

4条回答

独厮守ぢ (楼主)

2021-02-02 06:04
this should work:
```
>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)
>>> df

           col1  col2
ind1 ind2            
A    0        3     2
     1        2     0
     2        2     3
B    3        2     4
C    4        3     1
     5        0     0


>>> df.groupby([df.index.get_level_values(0),'col1']).count()

           col2
ind1 col1      
A    2        2
     3        1
B    2        1
C    0        1
     3        1
```
I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...

check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题