Querying DAU/MAU over time (daily)

前端 未结 4 1385
礼貌的吻别
礼貌的吻别 2021-02-04 07:08

I have a daily sessions table with columns user_id and date. I\'d like to graph out DAU/MAU (daily active users / monthly active users) on a daily basis. For example:

         


        
4条回答
  •  猫巷女王i
    2021-02-04 07:52

    I've written about this on my blog.

    The DAU is easy, as you noticed. You can solve the MAU by first creating a view with boolean values for when a user activates and de-activates, like so:

    CREATE OR REPLACE VIEW "vw_login" AS 
     SELECT *
        , LEAST (LEAD("date") OVER w, "date" + 30) AS "activeExpiry"
        , CASE WHEN LAG("date") OVER w IS NULL THEN true ELSE false AS "activated"
        , CASE
     WHEN LEAD("date") OVER w IS NULL THEN true
     WHEN LEAD("date") OVER w - "date" > 30 THEN true
     ELSE false
     END AS "churned"
        , CASE
     WHEN LAG("date") OVER w IS NULL THEN false
     WHEN "date" - LAG("date") OVER w <= 30 THEN false
     WHEN row_number() OVER w > 1 THEN true
     ELSE false
     END AS "resurrected"
       FROM "login"
       WINDOW w AS (PARTITION BY "user_id" ORDER BY "date")
    

    This creates boolean values per user per day when they become active, when they churn and when they re-activate.

    Then do a daily aggregate of the same:

    CREATE OR REPLACE VIEW "vw_activity" AS
    SELECT 
        SUM("activated"::int) "activated"
      , SUM("churned"::int) "churned"
      , SUM("resurrected"::int) "resurrected"
      , "date"
      FROM "vw_login"
      GROUP BY "date"
      ;
    

    And finally calculate running totals of active MAUs by calculating the cumulative sums over the columns. You need to join the vw_activity twice, since the second one is joined to the day when the user becomes inactive (i.e. 30 days since their last login).

    I've included a date series in order to ensure that all days are present in your dataset. You can do without it too, but you might skip days in your dataset.

    SELECT
     d."date"
     , SUM(COALESCE(a.activated::int,0)
       - COALESCE(a2.churned::int,0)
       + COALESCE(a.resurrected::int,0)) OVER w
     , d."date", a."activated", a2."churned", a."resurrected" FROM
     generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
     LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
     LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
     WINDOW w AS (ORDER BY d."date") ORDER BY d."date";
    

    You can of course do this in a single query, but this helps understand the structure better.

提交回复
热议问题