In the dataframe below, I would like to eliminate the duplicate cid values so the output from df.groupby(\'date\').cid.size() matches the output fr
cid
df.groupby(\'date\').cid.size()
You don't need groupby to drop duplicates based on a few columns, you can specify a subset instead:
df2 = df.drop_duplicates(["date", "cid"]) df2.groupby('date').cid.size() Out[99]: date 2005 3 2006 10 2007 227 2008 52 2009 142 2010 57 2011 219 2012 99 2013 238 2014 146 dtype: int64