问题
I'm using the time series dataset from tableau (https://community.tableau.com/thread/194200), containing daily furniture sales, and I want to resample to get average monthly sales.
And I tried using resample in Pandas to get monthly mean:
There are four days in January selling furniture,
and there is no sales in the rest of Jan.
Order Date Sales
...
2014/1/6 2573.82
2014/1/7 76.728
2014/1/16 127.104
2014/1/20 38.6
...
y_furniture = furniture['Sales'].resample('MS').mean()
I want the result to be the actual average sale per month.
That is, all daily sales adding up and divided by 31 days, which is 90.85, but the code divided the summation by 4, which is around 704. This doesn't correctly reflect the actual monthly sales.
Does anyone know how to solve this problem?
回答1:
I'm not sure whether your expected ans is 90.85 or 704. So I'm providing solution for the both, choose it as per your requirements.
l1 = ['Order Date',
'Sales',
]
l2 = [['2014/1/6',2573.82],
['2014/1/7',76.728],
['2014/1/16',127.104],
['2014/1/20',38.6],
['2014/2/20',38.6],
]
df = pd.DataFrame(l2, columns=l1)
df['Order Date'] = pd.to_datetime(df['Order Date']) #make sure Order Date is of Date type
x = df.groupby(df['Order Date'].dt.month).mean() #or .agg('mean')
#### Output ####
Order Date
1 704.063
2 38.600
def doCalculation(df):
groupSum = df['Sales'].sum()
return (groupSum / df['Order Date'].dt.daysinmonth)
y = df.groupby(df['Order Date'].dt.month).apply(doCalculation).groupby(['Order Date']).mean()
#### Output ####
Order Date
1 90.846839
2 1.378571
回答2:
You can get the average sales per month using a pivot table: Try:
df['Order_date']=pd.to_datetime(df['Order_date'])
df['Month']=df['Order_date'].dt.month
df_pivot=df.pivot_table(columns='Month',aggfunc='mean')
来源:https://stackoverflow.com/questions/55890751/pandas-resample-to-get-monthly-average-with-time-series-data