Group by index + column in pandas

后端 未结 4 436
北荒
北荒 2021-02-02 05:27

I have a dataframe that has the columns

  1. user_id
  2. item_bought

Here user_id is the index of the df. I want to group by both user_id and item_b

4条回答
  •  独厮守ぢ
    2021-02-02 06:04

    this should work:

    >>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
    >>> df['ind1'] = list('AAABCC')
    >>> df['ind2'] = range(6)
    >>> df.set_index(['ind1','ind2'], inplace=True)
    >>> df
    
               col1  col2
    ind1 ind2            
    A    0        3     2
         1        2     0
         2        2     3
    B    3        2     4
    C    4        3     1
         5        0     0
    
    
    >>> df.groupby([df.index.get_level_values(0),'col1']).count()
    
               col2
    ind1 col1      
    A    2        2
         3        1
    B    2        1
    C    0        1
         3        1
    

    I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...

    check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"

提交回复
热议问题