Group by index + column in pandas

后端未结

关注

 4  437

北荒

I have a dataframe that has the columns

user_id
item_bought

Here user_id is the index of the df. I want to group by both user_id and item_b

相关标签:

4条回答

独厮守ぢ

2021-02-02 06:04

this should work:

>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)
>>> df

           col1  col2
ind1 ind2            
A    0        3     2
     1        2     0
     2        2     3
B    3        2     4
C    4        3     1
     5        0     0


>>> df.groupby([df.index.get_level_values(0),'col1']).count()

           col2
ind1 col1      
A    2        2
     3        1
B    2        1
C    0        1
     3        1

I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...

check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"

0 讨论(0)

故里飘歌

2021-02-02 06:11
I had the same problem - imported a bunch of data and I wanted to groupby a field that was the index. I didn't have a multi-index or any of that jazz and nor do you.

I figured the problem is that the field I want is the index, so at first I just reset the index - but this gives me a useless index field that I don't want. So now I do the following (two levels of grouping):
```
grouped = df.reset_index().groupby(by=['Field1','Field2'])
```
then I can use 'grouped' in a bunch of ways for different reports
```
grouped[['Field3','Field4']].agg([np.mean, np.std])
```
(which was what I wanted, giving me Field4 and Field3 averages, grouped by Field1 (the index) and Field2

For you, if you just want to do the count of items per user, in one simple line using groupby, the code could be
```
df.reset_index().groupby(by=['user_id']).count()
```
If you want to do more things then you can (like me) create 'grouped' and then use that. As a beginner, I find it easier to follow that way.

Please note, that the "reset_index" is not 'in place' and so will not mess up your original dataframe
0 讨论(0)
发布评论:

提交评论
- 加载中...

故里飘歌

2021-02-02 06:13

From version 0.20.1 it is simplier:

Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
                   'B': np.arange(8)}, index=index)

print (df)

              A  B
first second      
bar   one     1  0
      two     1  1
baz   one     1  2
      two     1  3
foo   one     2  4
      two     2  5
qux   one     3  6
      two     3  7

print (df.groupby(['second', 'A']).sum())
          B
second A   
one    1  2
       2  4
       3  6
two    1  4
       2  5
       3  7

0 讨论(0)

情深已故

2021-02-02 06:24

import pandas as pd

import numpy as np

In [11]:

df = pd.DataFrame()

In [12]:

df['user_id'] = ['b','b','b','c']

In [13]:

df['item_bought'] = ['x','x','y','y']

In [14]:

df['ct'] = 1

In [15]:

df

Out[15]:
    user_id     item_bought     ct
0   b   x   1
1   b   x   1
2   b   y   1
3   c   y   1
In [16]:

pd.pivot_table(df,values='ct',index=['user_id','item_bought'],aggfunc=np.sum)

Out[16]:

user_id  item_bought
b        x              2
         y              1
c        y              1

0 讨论(0)