Pandas unable to filter rows by quarter in specific year

拈花ヽ惹草 提交于 2020-05-17 06:01:34

问题


I have a dataset like below-

  Store   Date     Weekly_Sales         
0   1   2010-05-02  1643690.90  
1   1   2010-12-02  1641957.44  
2   1   2010-02-19  1611968.17  
3   1   2010-02-26  1409727.59  
4   1   2010-05-03  1554806.68

It has 100 stores in all. I want to filter the data of the year 2012 by Quarter

# Filter out only the data in 2012 from the dataset

import datetime as dt
df['Date'] = pd.to_datetime(df['Date'])
ds_2012 = df[df['Date'].dt.year == 2012]

# Calculate Q on the dataset
ds_2012 = ds_2012.sort_values(['Date'],ascending=True)
quarterly_sales = ds_2012.groupby(['Store', pd.Grouper(key='Date', freq='Q')])['Weekly_Sales'].sum()
quarterly_sales.head(20)

Output Received

Store     Date      
1      2012-03-31    18951097.69
       2012-06-30    21036965.58
       2012-09-30    18633209.98
       2012-12-31     9580784.77

The Summation of of Q2(2012-06-30) and Q3(2012-09-30) both are incorrect when filtered in excel. I am a newbie to Pandas


回答1:


You can groupby store and resample the DataFrame quarterly:

import pandas as pd
df=pd.concat([pd.DataFrame({'Store':[i]*12, 'Date':pd.date_range(start='2020-01-01', periods=12, freq='M'), 'Sales':list(range(12))}) for i in [1,2]])
df.groupby('Store').resample('Q', on='Date').sum().drop('Store', axis=1)

                  Sales
Store Date             
1     2020-03-31      3
      2020-06-30     12
      2020-09-30     21
      2020-12-31     30
2     2020-03-31      3
      2020-06-30     12
      2020-09-30     21
      2020-12-31     30

Maybe check the groupby and resample docs aswell.



来源:https://stackoverflow.com/questions/61612351/pandas-unable-to-filter-rows-by-quarter-in-specific-year

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!