I have a table containing dates and the various cars sold on each dates in the following format (These are only 2 of many columns):
DATE CAR
2012/01/01
The default behavior of type category
is exactly what you want. The non present categories will display with a value of zero. You just need to do:
df.astype({'CAR': 'category'})[df.CAR=='BMW']['DATE'].value_counts()
or better yet, make it definitively a category in your dataframe:
df.CAR = df.CAR.astype('category')
df[df.CAR=='BMW'].DATE.value_counts()
The category type is a better representation of your data and more space-efficient.
You can reindex the result after value_counts
and fill the missing values with 0.
df.loc[df.CAR == 'BMW', 'DATE'].value_counts().reindex(
df.DATE.unique(), fill_value=0)
Output:
2012/01/01 2
2012/01/02 1
2012/01/03 0
2012/09/01 1
2012/09/02 0
Name: DATE, dtype: int64
Instead of value_counts
you could also consider checking the equality and summing, grouped by the dates, which will include all of them.
df['CAR'].eq('BMW').astype(int).groupby(df['DATE']).sum()
Output:
DATE
2012/01/01 2
2012/01/02 1
2012/01/03 0
2012/09/01 1
2012/09/02 0
Name: CAR, dtype: int32