I have a dataframe that has the columns
Here user_id is the index of the df. I want to group by both user_id and item_b
this should work:
>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df['ind1'] = list('AAABCC')
>>> df['ind2'] = range(6)
>>> df.set_index(['ind1','ind2'], inplace=True)
>>> df
col1 col2
ind1 ind2
A 0 3 2
1 2 0
2 2 3
B 3 2 4
C 4 3 1
5 0 0
>>> df.groupby([df.index.get_level_values(0),'col1']).count()
col2
ind1 col1
A 2 2
3 1
B 2 1
C 0 1
3 1
I had the same problem using one of the columns from multiindex. with multiindex, you cannot use df.index.levels[0] since it has only distinct values from that particular index level and will be most likely of different size than whole dataframe...
check http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.get_level_values.html - get_level_values "Return vector of label values for requested level, equal to the length of the index"