I have a dataframe of taxi data with two columns that looks like this:
Neighborhood Borough Time
Midtown Manhattan X
Melrose B
I think you can use nlargest - you can change 1
to 5
:
s = df['Neighborhood'].groupby(df['Borough']).value_counts()
print s
Borough
Bronx Melrose 7
Manhattan Midtown 12
Lincoln Square 2
Staten Island Grant City 11
dtype: int64
print s.groupby(level=[0,1]).nlargest(1)
Bronx Bronx Melrose 7
Manhattan Manhattan Midtown 12
Staten Island Staten Island Grant City 11
dtype: int64
additional columns were getting created, specified level info
You can also try below code to get only top 10 values of value counts
'country_code' and 'raised_amount_usd' is column names.
groupby_country_code=master_frame.groupby('country_code') arr=groupby_country_code['raised_amount_usd'].sum().sort_index()[0:10] print(arr)
[0:10] shows index 0 to 10 from array for slicing. you can choose your slicing option.
df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)
head() gets the top 5 rows in a data frame.
You can do this in a single line by slightly extending your original groupby with 'nlargest':
>>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
Borough Neighborhood Neighborhood
Bronx Melrose Melrose 1
Manhattan Midtown Midtown 1
Manhatten Lincoln Square Lincoln Square 1
Midtown Midtown 1
Staten Island Grant City Grant City 1
dtype: int64