Counting qualitative values based on the date range in Pandas

问题

I am learning to use Pandas library and need to perform analysis and plot the crime data set below. Each row represents one occurrence of crime. date_rep column contains daily dates for a year.

enter image description here

Data needs to be grouped by month and instances of specific crime need to be added up per month, like in the table below.

enter image description here

The problem I am running into is that data in crime column is qualitative and I just cant find resources online that can help me solve this!

I have been reading up on groupby and different methods of sorting but what is the most efficient way of accomplishing this? Thank you in advance!

回答1:

To replicate something of your data:

In [29]: df = pd.DataFrame({'date_rep':pd.date_range('2012-01-01', periods=100),
    ...:                    'crm_cd_desc':np.random.choice(['robbery', 'traffic', 'assault'], size=100)})


In [30]: df.head()
Out[30]: 
  crm_cd_desc   date_rep
0     traffic 2012-01-01
1     traffic 2012-01-02
2     assault 2012-01-03
3     robbery 2012-01-04

In essence, what you want to do is a value counts:

In [31]: df['crm_cd_desc'].value_counts()
Out[31]: 
assault    36
traffic    34
robbery    30
dtype: int64

However, you want to do this for each month seperately. To group by month, you can use pd.Grouper inside groupby to specify the month:

In [34]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts()
Out[34]: 
date_rep           
2012-01-31  traffic    12
            robbery    10
            assault     9
2012-02-29  assault    13
            traffic    11
            robbery     5
2012-03-31  assault    12
            robbery    10
            traffic     9
2012-04-30  robbery     5
            assault     2
            traffic     2
dtype: int64

And then unstack to get the result:

In [35]: df.groupby(pd.Grouper(key='date_rep', freq='M'))['crm_cd_desc'].value_counts().unstack()
Out[35]: 
            assault  robbery  traffic
date_rep                             
2012-01-31        9       10       12
2012-02-29       13        5       11
2012-03-31       12       10        9
2012-04-30        2        5        2

Instead of using value_counts, you can also group by both the month and the crime type and then calculate the length of each group:

In [46]: df.groupby([pd.Grouper(key='date_rep', freq='M'), 'crm_cd_desc']).size().unstack()
Out[46]: 
crm_cd_desc  assault  robbery  traffic
date_rep                              
2012-01-31        9       10       12
2012-02-29       13        5       11
2012-03-31       12       10        9
2012-04-30        2        5        2

来源：https://stackoverflow.com/questions/25321357/counting-qualitative-values-based-on-the-date-range-in-pandas

标签

python

pandas

data-analysis