Bigquery SQL for sliding window aggregate

你。 提交于 2019-11-27 16:40:23

问题


Hi I have a table that looks like this

Date         Customer   Pageviews
2014/03/01   abc          5
2014/03/02   xyz          8
2014/03/03   abc          6

I want to get page view aggregates grouped by week but showing aggregates for past 30 days - (sliding window aggregates with window-size of 30 days for every week)

I am using google bigquery

EDIT: Gordon - re your comment about "Customer", Actually what I need is slightly more complicated thats why I included customer in the table above. I am looking to get the number of customers who had >n pageviews in a 30day window every week. something like this

Date        Customers>10 pageviews in 30day window
2014/02/01  10
2014/02/08  5
2014/02/15  6
2014/02/22  15

However to keep it simple, I will work my way if I could just get a sliding window aggregate of pageviews ignoring customers altogether. something like this

Date        count of pageviews in 30day window
2014/02/01  50
2014/02/08  55
2014/02/15  65
2014/02/22  75

回答1:


How about this:

SELECT changes + changes1 + changes2 + changes3 changes28days, login, USEC_TO_TIMESTAMP(week)
FROM (
  SELECT changes,
         LAG(changes, 1) OVER (PARTITION BY login ORDER BY week) changes1,
         LAG(changes, 2) OVER (PARTITION BY login ORDER BY week) changes2,
         LAG(changes, 3) OVER (PARTITION BY login ORDER BY week) changes3,
         login,
         week
  FROM (
    SELECT SUM(payload_pull_request_changed_files) changes, 
           UTC_USEC_TO_WEEK(created_at, 1) week,
           actor_attributes_login login,
    FROM [publicdata:samples.github_timeline]
    WHERE payload_pull_request_changed_files > 0
    GROUP BY week, login
))
HAVING changes28days > 0

For each user it counts how many changes they have submitted per week. Then with LAG() we can peek into the next row, how many changes they submitted the -1, -2, and -3 week. Then we just add those 4 weeks to see how many changes were submitted on the last 28 days.

Now you can wrap everything in a new query to filter users with changes>X, and count them.




回答2:


I have created the following "Times" table:

Table Details: Dim_Periods
Schema
Date    TIMESTAMP   
Year    INTEGER         
Month   INTEGER         
day         INTEGER         
QUARTER INTEGER     
DAYOFWEEK   INTEGER     
MonthStart  TIMESTAMP   
MonthEnd    TIMESTAMP   
WeekStart   TIMESTAMP   
WeekEnd TIMESTAMP   
Back30Days  TIMESTAMP   -- the date 30 days before "Date"
Back7Days   TIMESTAMP   -- the date 7 days before "Date"

and I use such query to handle "running sums"

SELECT Date,Count(*) as MovingCNT
FROM

(SELECT Date,
                Back7Days 
                    FROM DWH.Dim_Periods  
                 where Date < timestamp(current_date()) AND
                             Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
                )P
                CROSS JOIN EACH
    (SELECT repository_url,repository_created_at
    FROM publicdata:samples.github_timeline
                ) L
        WHERE timestamp(repository_created_at)>= Back7Days 
              AND timestamp(repository_created_at)<= Date

GROUP EACH BY Date

Note that it can be used for "Month to date", Week to Date" "30 days back" etc. aggregations as well. However, performance is not the best and the query can take a while on larger data sets due to the Cartesian join. Hope this helps



来源:https://stackoverflow.com/questions/22415647/bigquery-sql-for-sliding-window-aggregate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!