Python: Grouping by date and finding the average of a column inside a dataframe

狂风中的少年 提交于 2021-02-07 03:29:45

问题


I have a data frame that has a 3 columns. Time represents every day of the month for various months. what I am trying to do is get the 'Count' value per day and average it per each month, and do this for each country. The output must be in the form of a data frame.

Curent data:

    Time    Country Count
 2017-01-01    us   7827
 2017-01-02    us   7748
 2017-01-03    us   7653
 ..
 ..
 2017-01-30    us   5432
 2017-01-31    us   2942
 2017-01-01    us   5829
 2017-01-02    ca   9843
 2017-01-03    ca   7845
 ..
 ..
 2017-01-30    ca   8654
 2017-01-31    ca   8534

Desire output (dummy data, numbers are not representative of the DF above):

    Time       Country   Monthly Average
 Jan 2017      us          6873
 Feb 2017      us          8875
 ..
 .. 
 Nov 2017      us          9614
 Dec 2017      us          2475
 Jan 2017      ca          1878
 Feb 2017      ca          4775
 ..
 .. 
 Nov 2017      ca          7643
 Dec 2017      ca          9441

回答1:


I'd organize it like this:

df.groupby(
    [df.Time.dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')

       Time Country  Monthly Average
0  Feb 2017      ca             88.0
1  Feb 2017      us            105.0
2  Jan 2017      ca             85.0
3  Jan 2017      us             24.6
4  Mar 2017      ca             86.0
5  Mar 2017      us             54.0

If your 'Time' column wasn't already a datetime column, I'd do this:

df.groupby(
    [pd.to_datetime(df.Time).dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')

       Time Country  Monthly Average
0  Feb 2017      ca             88.0
1  Feb 2017      us            105.0
2  Jan 2017      ca             85.0
3  Jan 2017      us             24.6
4  Mar 2017      ca             86.0
5  Mar 2017      us             54.0



回答2:


Use pandas dt strftime to create a month-year column that you desire + groupby + mean. Used this dataframe:

Dated     country   num 
2017-01-01  us     12   
2017-01-02  us     12   
2017-02-02  us     134  
2017-02-03  us     76   
2017-03-30  us     54   
2017-01-31  us     29   
2017-01-01  us     58   
2017-01-02  us     12   
2017-02-02  ca     98   
2017-02-03  ca     78   
2017-03-30  ca     86   
2017-01-31  ca     85   

Then create a Month-Year column:

a['MonthYear']= a.Dated.dt.strftime('%b %Y')

Then, drop the Date column and aggregate by mean:

a.drop('Dated', axis=1).groupby(['MonthYear','country']).mean().rename(columns={'num':'Averaged'}).reset_index()

MonthYear      country  Averaged
Feb 2017    ca      88.0
Feb 2017    us      105.0
Jan 2017    ca      85.0
Jan 2017    us      24.6
Mar 2017        ca      86.0
Mar 2017        us      54.0

I retained the Dated column just in case.



来源:https://stackoverflow.com/questions/47244294/python-grouping-by-date-and-finding-the-average-of-a-column-inside-a-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!