Count items greater than a value in pandas groupby

后端 未结 4 1311
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-17 08:49

I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:

reviews.groupby(\'business         


        
相关标签:
4条回答
  • 2021-01-17 09:21

    I quite like using method chaining with Pandas as I find it easier to read. I haven't tried it but I think this should also work

    reviews.query("stars > 3").groupby("business_id").size()
    
    0 讨论(0)
  • 2021-01-17 09:29

    You can try to do :

    reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
    
    0 讨论(0)
  • 2021-01-17 09:40

    As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:

    # Counting both over and under
    reviews.groupby('business_id')\
           .agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()), 
                under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
           .reset_index()
    

    The pandas.NamedAgg allows you to create multiple new columns now that the functionality was removed in never versions of pandas.

    0 讨论(0)
  • 2021-01-17 09:44

    A bit late, but my solution is:

    reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3]) )
    

    I came across this thread in search of finding "what is the fraction of values above X in a given GroupBy". Here is the solution if anyone is interested:

    reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3])/len(x) )
    
    0 讨论(0)
提交回复
热议问题