Group by and find top n value_counts pandas

后端 未结 4 1483
终归单人心
终归单人心 2020-12-25 12:04

I have a dataframe of taxi data with two columns that looks like this:

Neighborhood    Borough        Time
Midtown         Manhattan      X
Melrose         B         


        
相关标签:
4条回答
  • 2020-12-25 12:43

    I think you can use nlargest - you can change 1 to 5:

    s = df['Neighborhood'].groupby(df['Borough']).value_counts()
    print s
    Borough                      
    Bronx          Melrose            7
    Manhattan      Midtown           12
                   Lincoln Square     2
    Staten Island  Grant City        11
    dtype: int64
    
    print s.groupby(level=[0,1]).nlargest(1)
    Bronx          Bronx          Melrose        7
    Manhattan      Manhattan      Midtown       12
    Staten Island  Staten Island  Grant City    11
    dtype: int64
    

    additional columns were getting created, specified level info

    0 讨论(0)
  • 2020-12-25 12:47

    You can also try below code to get only top 10 values of value counts

    'country_code' and 'raised_amount_usd' is column names.

    groupby_country_code=master_frame.groupby('country_code') arr=groupby_country_code['raised_amount_usd'].sum().sort_index()[0:10] print(arr)

    [0:10] shows index 0 to 10 from array for slicing. you can choose your slicing option.

    0 讨论(0)
  • 2020-12-25 12:54

    df['Neighborhood'].groupby(df['Borough']).value_counts().head(5)

    head() gets the top 5 rows in a data frame.

    0 讨论(0)
  • 2020-12-25 13:00

    You can do this in a single line by slightly extending your original groupby with 'nlargest':

    >>> df.groupby(['Borough', 'Neighborhood']).Neighborhood.value_counts().nlargest(5)
    Borough        Neighborhood    Neighborhood  
    Bronx          Melrose         Melrose           1
    Manhattan      Midtown         Midtown           1
    Manhatten      Lincoln Square  Lincoln Square    1
                   Midtown         Midtown           1
    Staten Island  Grant City      Grant City        1
    dtype: int64
    
    0 讨论(0)
提交回复
热议问题