How to calculate DAU/MAU with BigQuery (engagement)

前端 未结 2 1119
后悔当初
后悔当初 2021-01-16 06:15

DAU and MAU (daily active users and monthly active users) are an established way of measuring user engagement.

How can I get these numbers using SQL and Google BigQ

相关标签:
2条回答
  • 2021-01-16 06:34

    In order to analyze trends while not waiting to have "full month", there is a need to look at each day with its predecessor 30 days... I am afraid that the suggested solution (by Felipe Hoffa), changes the question, not just the data retrieval query.

    You can find bellow my take of the issue. I am not sure what it does under the hood in terms of performance, and it is not very fast (much slower than Felipe's...), but it covers the business need as I understand it. Still, if you could offer a solution that optimize this approach, that would be great.

    Please note: no use of any joins and sub aggregates, just splits, group by, and date manipulations.

    SELECT
      *,
      DAU/WAU AS DAW_WAU,
      DAU/MAU AS DAW_MAU,
    FROM (
      SELECT
        COALESCE(DAUDate,WAUDate,MAUDate) AS ReportDate,
        subreddit,
        EXACT_COUNT_DISTINCT(IF(DAUDate IS NOT NULL,author,NULL)) AS DAU,
        EXACT_COUNT_DISTINCT(IF(WAUDate IS NOT NULL,author,NULL)) AS WAU,
        EXACT_COUNT_DISTINCT(IF(MAUDate IS NOT NULL,author,NULL)) AS MAU,
      FROM (
        SELECT
          DDate,
          subreddit,
          author,
          Ind,
          DATE(IF(Ind=0,DDate,NULL)) AS DAUDate,
          DATE(IF(Ind<7,DATE_ADD(DDate,Ind,"Day"),NULL)) AS WAUDate,
          DATE(IF(Ind<30,DATE_ADD(DDate,Ind,"Day"),NULL)) AS MAUDate
        FROM (
          SELECT
            DATE(SEC_TO_TIMESTAMP(created_utc)) AS DDate,
            subreddit,
            author,
            INTEGER(SPLIT("0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30",",")) AS Ind
          FROM
            [fh-bigquery:reddit_comments.2015_09],
            [fh-bigquery:reddit_comments.2015_08] ))
      WHERE
        COALESCE(DAUDate,WAUDate,MAUDate)<DATE(TIMESTAMP("2015-10-01")/*Current_Timestamp()*/)
      GROUP EACH BY
        1,
        2)
    HAVING
      MAU>50000
    ORDER BY
      2,
      1 DESC
    
    0 讨论(0)
  • 2019 standard SQL update:

    • https://stackoverflow.com/a/49866033/132438

    (to understand the utility of DAU/MAU see articles like http://blog.compariscope.wefi.com/mobile-app-usage-dau-mau)

    Let's play with the reddit comments data stored in BigQuery. We want to find out the dau/mau ratio for the 'AskReddit' subreddit during September on a daily rolling basis:

    SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
    FROM (
      SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
      FROM (
        SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
        FROM [fh-bigquery:reddit_comments.2015_09]
        WHERE subreddit='AskReddit') a
      JOIN (
        SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
        FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
        CROSS JOIN (
          SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
          FROM [fh-bigquery:reddit_comments.2015_09]
          GROUP BY 1
        ) b
        WHERE subreddit='AskReddit'
        AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
        GROUP BY 1
      ) b
      ON a.day=b.stopday
      GROUP BY 1
    )
    ORDER BY 1
    

    This query gets DAU for each day in September, and looks also into August data to get the MAU for each 30 day period ending in each DAU day. That takes a lot of processing (30x), and we can get almost equivalent results if we only calculate one MAU for September, and proceed to use that value as the denominator:

    SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
    FROM (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
      FROM [fh-bigquery:reddit_comments.2015_09] a
      CROSS JOIN (
        SELECT EXACT_COUNT_DISTINCT(author) mau
        FROM [fh-bigquery:reddit_comments.2015_09]
        WHERE subreddit='AskReddit'
      ) b
      WHERE subreddit='AskReddit'
      GROUP BY 1
    )
    ORDER BY 1
    

    That's a much simpler query that brings us almost equivalent results much faster.

    Now to get an average value for this subreddit for the month:

    SELECT ROUND(100*AVG(dau/mau), 2) daumau
    FROM (
      SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
      FROM [fh-bigquery:reddit_comments.2015_09] a
      CROSS JOIN (
        SELECT EXACT_COUNT_DISTINCT(author) mau
        FROM [fh-bigquery:reddit_comments.2015_09]
        WHERE subreddit='AskReddit'
      ) b
      WHERE subreddit='AskReddit'
      GROUP BY 1
    )
    

    This tells us that 'AskReddit' had an engagement of 8.95% during September.

    Last stop, how to compare engagement within various subreddits:

    SELECT ROUND(100*AVG(dau)/MAX(mau), 2) avg_daumau, MAX(mau) mau, subreddit
    FROM (
      SELECT a.subreddit, DATE(SEC_TO_TIMESTAMP(created_utc)) day,
             EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
      FROM [fh-bigquery:reddit_comments.2015_09] a
      JOIN (
        SELECT EXACT_COUNT_DISTINCT(author) mau, subreddit
        FROM [fh-bigquery:reddit_comments.2015_09]
        GROUP BY 2
      ) b
      ON a.subreddit=b.subreddit
      WHERE mau>50000
      GROUP BY 1, 2
    )
    
    GROUP BY subreddit
    ORDER BY 1
    

    0 讨论(0)
提交回复
热议问题