BigQuery : is it possible to execute another query inside an UDF?

泄露秘密 提交于 2019-12-24 00:42:59

问题


I have a table that records a row for each unique user per day with some aggregated stats for that user on that day, and I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

eg.

  • for Aug 31st, it'll count the unique users from Aug 2nd to Aug 31st
  • for Aug 30th, it'll count the unique users from Aug 1st to Aug 30th
  • and so on...

I've looked at some related questions but they aren't quite what I need - if a user logs in on multiple days in the last 30 days he should be counted only once, so I can't just sum the DAU count for the last 30 days.

Bigquery SQL for sliding window aggregate

BigQuery SQL for 28-day sliding window aggregate (without writing 28 lines of SQL)

So far, my ideas are to either:

  • write a simple script that'll execute a separate BigQuery for each of the relevant days
  • write a BigQuery UDF that'll execute basically the same query for each day selected from another query

but I've not found any examples on how to execute another BigQuery query inside an UDF, or if it's possible at all.


回答1:


I need to produce a report that tells me for each day, the no. of unique users in the last 30 days including that day.

Below should do this

SELECT
  calendar_day, 
  EXACT_COUNT_DISTINCT(userID) AS unique_users
FROM (
  SELECT calendar_day, userID
  FROM YourTable
  CROSS JOIN (
    SELECT DATE(DATE_ADD('2016-08-08', pos - 1, "DAY")) AS calendar_day
    FROM (
         SELECT ROW_NUMBER() OVER() AS pos, *
         FROM (FLATTEN((
         SELECT SPLIT(RPAD('', 1 + DATEDIFF('2016-09-08', '2016-08-08'), '.'),'') AS h
         FROM (SELECT NULL)),h
    )))
  ) AS calendar
  WHERE DATEDIFF(calendar_day, dt) BETWEEN 0 AND 29 
)
GROUP BY calendar_day
ORDER BY calendar_day DESC

It assumes YourTable has userID and dt fields (like below for example)

dt          userID   
2016-09-08       1   
2016-09-08       2   
...

And you can control:
- reporting dates range by changing respectively 2016-08-08 and 2016-09-08
- aggregation size by changing 29 in BETWEEN 0 AND 29



来源:https://stackoverflow.com/questions/39399930/bigquery-is-it-possible-to-execute-another-query-inside-an-udf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!