Group by index + column in pandas

后端 未结 4 431
北荒
北荒 2021-02-02 05:27

I have a dataframe that has the columns

  1. user_id
  2. item_bought

Here user_id is the index of the df. I want to group by both user_id and item_b

相关标签:
4条回答
  • 2021-02-02 06:04

    this should work:

    >>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
    >>> df['ind1'] = list('AAABCC')
    >>> df['ind2'] = range(6)
    >>> df.set_index(['ind1','ind2'], inplace=True)
    >>> df
    
               col1  col2
    ind1 ind2            
    A    0        3     2
         1        2     0
         2        2     3
    B    3        2     4
    C    4        3     1
         5        0     0
    
    
    >>> df.groupby([df.index.get_level_values(0),'col1']).count()
    
               col2
    ind1 col1      
    A    2        2
         3        1
    B    2        1
    C    0        1
         3        1
    

    I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...

    check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"

    0 讨论(0)
  • 2021-02-02 06:11

    I had the same problem - imported a bunch of data and I wanted to groupby a field that was the index. I didn't have a multi-index or any of that jazz and nor do you.

    I figured the problem is that the field I want is the index, so at first I just reset the index - but this gives me a useless index field that I don't want. So now I do the following (two levels of grouping):

    grouped = df.reset_index().groupby(by=['Field1','Field2'])
    

    then I can use 'grouped' in a bunch of ways for different reports

    grouped[['Field3','Field4']].agg([np.mean, np.std])
    

    (which was what I wanted, giving me Field4 and Field3 averages, grouped by Field1 (the index) and Field2

    For you, if you just want to do the count of items per user, in one simple line using groupby, the code could be

    df.reset_index().groupby(by=['user_id']).count()
    

    If you want to do more things then you can (like me) create 'grouped' and then use that. As a beginner, I find it easier to follow that way.

    Please note, that the "reset_index" is not 'in place' and so will not mess up your original dataframe

    0 讨论(0)
  • 2021-02-02 06:13

    From version 0.20.1 it is simplier:

    Strings passed to DataFrame.groupby() as the by parameter may now reference either column names or index level names

    arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
              ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
    
    index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])
    
    df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
                       'B': np.arange(8)}, index=index)
    
    print (df)
    
                  A  B
    first second      
    bar   one     1  0
          two     1  1
    baz   one     1  2
          two     1  3
    foo   one     2  4
          two     2  5
    qux   one     3  6
          two     3  7
    
    print (df.groupby(['second', 'A']).sum())
              B
    second A   
    one    1  2
           2  4
           3  6
    two    1  4
           2  5
           3  7
    
    0 讨论(0)
  • 2021-02-02 06:24
    import pandas as pd
    
    import numpy as np
    
    In [11]:
    
    df = pd.DataFrame()
    
    In [12]:
    
    df['user_id'] = ['b','b','b','c']
    
    In [13]:
    
    df['item_bought'] = ['x','x','y','y']
    
    In [14]:
    
    df['ct'] = 1
    
    In [15]:
    
    df
    
    Out[15]:
        user_id     item_bought     ct
    0   b   x   1
    1   b   x   1
    2   b   y   1
    3   c   y   1
    In [16]:
    
    pd.pivot_table(df,values='ct',index=['user_id','item_bought'],aggfunc=np.sum)
    
    Out[16]:
    
    user_id  item_bought
    b        x              2
             y              1
    c        y              1
    
    0 讨论(0)
提交回复
热议问题