I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:
reviews.groupby(\'business
I quite like using method chaining with Pandas as I find it easier to read. I haven't tried it but I think this should also work
reviews.query("stars > 3").groupby("business_id").size()
You can try to do :
reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:
# Counting both over and under
reviews.groupby('business_id')\
.agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()),
under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
.reset_index()
The pandas.NamedAgg allows you to create multiple new columns now that the functionality was removed in never versions of pandas.
A bit late, but my solution is:
reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3]) )
I came across this thread in search of finding "what is the fraction of values above X in a given GroupBy". Here is the solution if anyone is interested:
reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3])/len(x) )