问题
I have a data frame that has a 3 columns. Time represents every day of the month for various months. what I am trying to do is get the 'Count' value per day and average it per each month, and do this for each country. The output must be in the form of a data frame.
Curent data:
Time Country Count
2017-01-01 us 7827
2017-01-02 us 7748
2017-01-03 us 7653
..
..
2017-01-30 us 5432
2017-01-31 us 2942
2017-01-01 us 5829
2017-01-02 ca 9843
2017-01-03 ca 7845
..
..
2017-01-30 ca 8654
2017-01-31 ca 8534
Desire output (dummy data, numbers are not representative of the DF above):
Time Country Monthly Average
Jan 2017 us 6873
Feb 2017 us 8875
..
..
Nov 2017 us 9614
Dec 2017 us 2475
Jan 2017 ca 1878
Feb 2017 ca 4775
..
..
Nov 2017 ca 7643
Dec 2017 ca 9441
回答1:
I'd organize it like this:
df.groupby(
[df.Time.dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')
Time Country Monthly Average
0 Feb 2017 ca 88.0
1 Feb 2017 us 105.0
2 Jan 2017 ca 85.0
3 Jan 2017 us 24.6
4 Mar 2017 ca 86.0
5 Mar 2017 us 54.0
If your 'Time'
column wasn't already a datetime column, I'd do this:
df.groupby(
[pd.to_datetime(df.Time).dt.strftime('%b %Y'), 'Country']
)['Count'].mean().reset_index(name='Monthly Average')
Time Country Monthly Average
0 Feb 2017 ca 88.0
1 Feb 2017 us 105.0
2 Jan 2017 ca 85.0
3 Jan 2017 us 24.6
4 Mar 2017 ca 86.0
5 Mar 2017 us 54.0
回答2:
Use pandas dt strftime
to create a month-year column that you desire + groupby
+ mean
. Used this dataframe:
Dated country num
2017-01-01 us 12
2017-01-02 us 12
2017-02-02 us 134
2017-02-03 us 76
2017-03-30 us 54
2017-01-31 us 29
2017-01-01 us 58
2017-01-02 us 12
2017-02-02 ca 98
2017-02-03 ca 78
2017-03-30 ca 86
2017-01-31 ca 85
Then create a Month-Year column:
a['MonthYear']= a.Dated.dt.strftime('%b %Y')
Then, drop the Date column and aggregate by mean:
a.drop('Dated', axis=1).groupby(['MonthYear','country']).mean().rename(columns={'num':'Averaged'}).reset_index()
MonthYear country Averaged
Feb 2017 ca 88.0
Feb 2017 us 105.0
Jan 2017 ca 85.0
Jan 2017 us 24.6
Mar 2017 ca 86.0
Mar 2017 us 54.0
I retained the Dated column just in case.
来源:https://stackoverflow.com/questions/47244294/python-grouping-by-date-and-finding-the-average-of-a-column-inside-a-dataframe