问题
I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year.
Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below.
The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this!
I have been reading up on groupby and different methods of sorting but what is the most efficient way of accomplishing this? Thank you in advance!
回答1:
To replicate something of your data:
In [29]: df = pd.DataFrame({'date_rep':pd.date_range('2012-01-01', periods=100),
...: 'crm_cd_desc':np.random.choice(['robbery', 'traffic', 'assault'], size=100)})
In [30]: df.head()
Out[30]:
crm_cd_desc date_rep
0 traffic 2012-01-01
1 traffic 2012-01-02
2 assault 2012-01-03
3 robbery 2012-01-04
In essence, what you want to do is a value counts:
In [31]: df['crm_cd_desc'].value_counts()
Out[31]:
assault 36
traffic 34
robbery 30
dtype: int64
However, you want to do this for each month seperately. To group by month, you can use pd.Grouper
inside groupby
to specify the month:
In [34]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts()
Out[34]:
date_rep
2012-01-31 traffic 12
robbery 10
assault 9
2012-02-29 assault 13
traffic 11
robbery 5
2012-03-31 assault 12
robbery 10
traffic 9
2012-04-30 robbery 5
assault 2
traffic 2
dtype: int64
And then unstack
to get the result:
In [35]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts().unstack()
Out[35]:
assault robbery traffic
date_rep
2012-01-31 9 10 12
2012-02-29 13 5 11
2012-03-31 12 10 9
2012-04-30 2 5 2
Instead of using value_counts
, you can also group by both the month and the crime type and then calculate the length of each group:
In [46]: df.groupby([pd.Grouper(key='date_rep', freq='M'), 'crm_cd_desc']).size().unstack()
Out[46]:
crm_cd_desc assault robbery traffic
date_rep
2012-01-31 9 10 12
2012-02-29 13 5 11
2012-03-31 12 10 9
2012-04-30 2 5 2
来源:https://stackoverflow.com/questions/25321357/counting-qualitative-values-based-on-the-date-range-in-pandas