Querying DAU/MAU over time (daily)

前端 未结 4 1382
礼貌的吻别 2021-02-04 07:08

I have a daily sessions table with columns user_id and date. I\'d like to graph out DAU/MAU (daily active users / monthly active users) on a daily basis. For example:


  • 2021-02-04 07:40

    You didn't show us your complete table definition, but maybe something like this:

    select date,
           count(*) over (partition by date_trunc('day', date) order by date) as dau,
           count(*) over (partition by date_trunc('month', date) order by date) as mau
    from sessions
    order by date;

    To get the percentage without repeating the window functions, just wrap this in a derived table:

    select date, 
           dau::numeric / (case when mau = 0 then null else mau end) as pct
    from (
        select date,
               count(*) over (partition by date_trunc('day', date) order by date) as dau,
               count(*) over (partition by date_trunc('month', date) order by date) as mau
        from sessions
    ) t
    order by date;

    Here is an example output:

    postgres=> select * from sessions;
     session_date | user_id
     2014-05-01   |       1
     2014-05-01   |       2
     2014-05-01   |       3
     2014-05-02   |       1
     2014-05-02   |       2
     2014-05-02   |       3
     2014-05-02   |       4
     2014-05-02   |       5
     2014-06-01   |       1
     2014-06-01   |       2
     2014-06-01   |       3
     2014-06-02   |       1
     2014-06-02   |       2
     2014-06-02   |       3
     2014-06-02   |       4
     2014-06-03   |       1
     2014-06-03   |       2
     2014-06-03   |       3
     2014-06-03   |       4
     2014-06-03   |       5
    (20 rows)
    postgres=> select session_date,
    postgres->        dau,
    postgres->        mau,
    postgres->        round(dau::numeric / (case when mau = 0 then null else mau end),2) as pct
    postgres-> from (
    postgres(>     select session_date,
    postgres(>            count(*) over (partition by date_trunc('day', session_date) order by session_date) as dau,
    postgres(>            count(*) over (partition by date_trunc('month', session_date) order by session_date) as mau
    postgres(>     from sessions
    postgres(> ) t
    postgres-> order by session_date;
     session_date | dau | mau | pct
     2014-05-01   |   3 |   3 | 1.00
     2014-05-01   |   3 |   3 | 1.00
     2014-05-01   |   3 |   3 | 1.00
     2014-05-02   |   5 |   8 | 0.63
     2014-05-02   |   5 |   8 | 0.63
     2014-05-02   |   5 |   8 | 0.63
     2014-05-02   |   5 |   8 | 0.63
     2014-05-02   |   5 |   8 | 0.63
     2014-06-01   |   3 |   3 | 1.00
     2014-06-01   |   3 |   3 | 1.00
     2014-06-01   |   3 |   3 | 1.00
     2014-06-02   |   4 |   7 | 0.57
     2014-06-02   |   4 |   7 | 0.57
     2014-06-02   |   4 |   7 | 0.57
     2014-06-02   |   4 |   7 | 0.57
     2014-06-03   |   5 |  12 | 0.42
     2014-06-03   |   5 |  12 | 0.42
     2014-06-03   |   5 |  12 | 0.42
     2014-06-03   |   5 |  12 | 0.42
     2014-06-03   |   5 |  12 | 0.42
    (20 rows)
    0 讨论(0)
  • 2021-02-04 07:52

    I've written about this on my blog.

    The DAU is easy, as you noticed. You can solve the MAU by first creating a view with boolean values for when a user activates and de-activates, like so:

     SELECT *
        , LEAST (LEAD("date") OVER w, "date" + 30) AS "activeExpiry"
        , CASE WHEN LAG("date") OVER w IS NULL THEN true ELSE false AS "activated"
        , CASE
     WHEN LEAD("date") OVER w IS NULL THEN true
     WHEN LEAD("date") OVER w - "date" > 30 THEN true
     ELSE false
     END AS "churned"
        , CASE
     WHEN LAG("date") OVER w IS NULL THEN false
     WHEN "date" - LAG("date") OVER w <= 30 THEN false
     WHEN row_number() OVER w > 1 THEN true
     ELSE false
     END AS "resurrected"
       FROM "login"
       WINDOW w AS (PARTITION BY "user_id" ORDER BY "date")

    This creates boolean values per user per day when they become active, when they churn and when they re-activate.

    Then do a daily aggregate of the same:

    CREATE OR REPLACE VIEW "vw_activity" AS
        SUM("activated"::int) "activated"
      , SUM("churned"::int) "churned"
      , SUM("resurrected"::int) "resurrected"
      , "date"
      FROM "vw_login"
      GROUP BY "date"

    And finally calculate running totals of active MAUs by calculating the cumulative sums over the columns. You need to join the vw_activity twice, since the second one is joined to the day when the user becomes inactive (i.e. 30 days since their last login).

    I've included a date series in order to ensure that all days are present in your dataset. You can do without it too, but you might skip days in your dataset.

     , SUM(COALESCE(a.activated::int,0)
       - COALESCE(a2.churned::int,0)
       + COALESCE(a.resurrected::int,0)) OVER w
     , d."date", a."activated", a2."churned", a."resurrected" FROM
     generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
     LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
     LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
     WINDOW w AS (ORDER BY d."date") ORDER BY d."date";

    You can of course do this in a single query, but this helps understand the structure better.

    0 讨论(0)
  • 2021-02-04 07:53

    Assuming you have values for each day, you can get the total counts using a subquery and range between:

    with dau as (
          select date, count(userid) as dau
          from dailysessions ds
          group by date
    select date, dau,
           sum(dau) over (order by date rows between -29 preceding and current row) as mau
    from dau;

    Unfortunately, I think you want distinct users rather than just user counts. That makes the problem much more difficult, especially because Postgres doesn't support count(distinct) as a window function.

    I think you have to do some sort of self join for this. Here is one method:

    with dau as (
          select date, count(distinct userid) as dau
          from dailysessions ds
          group by date
    select date, dau,
           (select count(distinct user_id)
            from dailysessions ds
            where ds.date between date - 29 * interval '1 day' and date
           ) as mau
    from dau;
    0 讨论(0)
  • 2021-02-04 07:53

    This one uses COUNT DISTINCT to get the rolling 30 days DAU/MAU:

    (calculating reddit's user engagement in BigQuery - but the SQL is standard enough to be used on other databases)

    SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
    FROM (
      SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
      FROM (
        SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
        FROM [fh-bigquery:reddit_comments.2015_09]
        WHERE subreddit='AskReddit') a
      JOIN (
        SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
        FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
        CROSS JOIN (
          SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
          FROM [fh-bigquery:reddit_comments.2015_09]
          GROUP BY 1
        ) b
        WHERE subreddit='AskReddit'
        AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
        GROUP BY 1
      ) b
      ON a.day=b.stopday
      GROUP BY 1
    ORDER BY 1

    I went further at How to calculate DAU/MAU with BigQuery (engagement)

    0 讨论(0)