问题
I am trying to create a 3 Month Moving Average based on some data that I have while using RedShift SQL or Domo BeastMode (if anyone is familiar with that).
The data is on a day to day basis, but needs to be displayed by month. So the quotes/revenue need to be summarized by month, and then a 3MMA needs to be calculated (excluding the current month).
So, if the quote was in April, I would need the average of Jan, Feb, Mar.
The input data looks like this:
Quote Date MM/DD/YYYY Revenue
3/24/2015 61214
8/4/2015 22983
9/3/2015 30000
9/15/2015 171300
9/30/2015 112000
And I need the output to look something like this:
Month Revenue 3MMA
Jan 2015 =Sum of Jan Rev =(Oct14 + Nov14 + Dec14) / 3
Feb 2015 =Sum of Feb Rev =(Nov14 + Dec14 + Jan15) / 3
March 2015 =Sum of Mar Rev =(Dec14 + Jan15 + Feb15) / 3
April 2015 =Sum of Apr Rev =(Jan15 + Feb15 + Mar15) / 3
May 2015 =Sum of May Rev =(Feb15 + Mar15 + Apr15) / 3
If anyone is able to help, I would be extremely grateful! I have been stuck on this for quite a while and have no idea what I'm doing when it comes to SQL lol.
Cheers, Logan.
回答1:
You can do this using aggregation and window functions:
select date_trunc('month', quotedate) as mon,
sum(revenue) as mon_revenue,
avg(sum(revenue)) over (order by date_trunc('month', quotedate) rows between 2 preceding and current row) as revenue_3mon
from t
group by date_trunc('month', quotedate)
order by mon;
Note: this uses average, so for the first and second row, it will divide by 1 and 2 respectively. It also assumes that you have at least one record for each month.
EDIT:
I wonder if there is an issue with aggregation functions mixed with analytic functions in RedShift. Is the following any better:
select m.*,
avg(mon_revenue) over (order by mon rows between 2 preceding and current row) as revenue_3mon
from (select date_trunc('month', quotedate) as mon,
sum(revenue) as mon_revenue
from t
group by date_trunc('month', quotedate)
) m
order by mon;
回答2:
you could do something like the way we create buckets for a rolling 6 weeks (the date column is called "date"):
case
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,1,current_date)) then 'CW'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-6,current_date)) then 'LW'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-13,current_date)) then '2W'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-20,current_date)) then '3W'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-27,current_date)) then '4W'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-34,current_date)) then '5W'
when date_trunc('week',dateadd(day,1,date)) = date_trunc('week',dateadd(day,-41,current_date)) then '6W'
end as dateweek
You could then create an average in a subsequent step in the dataflow...
回答3:
You cannot use aggregate functions and analytic function together the query should be
select m.*,
avg(mon_revenue) over (order by mon rows between 3 preceding and 1 preceding) as revenue_3mon -- using 3 preceding and 1 preceding row you exclude the current row
from (select date_trunc('month', quotedate) as mon,
sum(revenue) as mon_revenue
from t
group by date_trunc('month', quotedate)
) m
order by mon;
rows between 3 preceding and 1 preceding (should remove the row in the end, otherwise redshift won't work)
来源:https://stackoverflow.com/questions/36120327/3-month-moving-average-redshift-sql