I have a daily sessions table with columns user_id and date. I\'d like to graph out DAU/MAU (daily active users / monthly active users) on a daily basis. For example:
You didn't show us your complete table definition, but maybe something like this:
select date,
count(*) over (partition by date_trunc('day', date) order by date) as dau,
count(*) over (partition by date_trunc('month', date) order by date) as mau
from sessions
order by date;
To get the percentage without repeating the window functions, just wrap this in a derived table:
select date,
dau,
mau,
dau::numeric / (case when mau = 0 then null else mau end) as pct
from (
select date,
count(*) over (partition by date_trunc('day', date) order by date) as dau,
count(*) over (partition by date_trunc('month', date) order by date) as mau
from sessions
) t
order by date;
Here is an example output:
postgres=> select * from sessions; session_date | user_id --------------+--------- 2014-05-01 | 1 2014-05-01 | 2 2014-05-01 | 3 2014-05-02 | 1 2014-05-02 | 2 2014-05-02 | 3 2014-05-02 | 4 2014-05-02 | 5 2014-06-01 | 1 2014-06-01 | 2 2014-06-01 | 3 2014-06-02 | 1 2014-06-02 | 2 2014-06-02 | 3 2014-06-02 | 4 2014-06-03 | 1 2014-06-03 | 2 2014-06-03 | 3 2014-06-03 | 4 2014-06-03 | 5 (20 rows) postgres=> select session_date, postgres-> dau, postgres-> mau, postgres-> round(dau::numeric / (case when mau = 0 then null else mau end),2) as pct postgres-> from ( postgres(> select session_date, postgres(> count(*) over (partition by date_trunc('day', session_date) order by session_date) as dau, postgres(> count(*) over (partition by date_trunc('month', session_date) order by session_date) as mau postgres(> from sessions postgres(> ) t postgres-> order by session_date; session_date | dau | mau | pct --------------+-----+-----+------ 2014-05-01 | 3 | 3 | 1.00 2014-05-01 | 3 | 3 | 1.00 2014-05-01 | 3 | 3 | 1.00 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-05-02 | 5 | 8 | 0.63 2014-06-01 | 3 | 3 | 1.00 2014-06-01 | 3 | 3 | 1.00 2014-06-01 | 3 | 3 | 1.00 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-02 | 4 | 7 | 0.57 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 2014-06-03 | 5 | 12 | 0.42 (20 rows) postgres=>
I've written about this on my blog.
The DAU is easy, as you noticed. You can solve the MAU by first creating a view with boolean values for when a user activates and de-activates, like so:
CREATE OR REPLACE VIEW "vw_login" AS
SELECT *
, LEAST (LEAD("date") OVER w, "date" + 30) AS "activeExpiry"
, CASE WHEN LAG("date") OVER w IS NULL THEN true ELSE false AS "activated"
, CASE
WHEN LEAD("date") OVER w IS NULL THEN true
WHEN LEAD("date") OVER w - "date" > 30 THEN true
ELSE false
END AS "churned"
, CASE
WHEN LAG("date") OVER w IS NULL THEN false
WHEN "date" - LAG("date") OVER w <= 30 THEN false
WHEN row_number() OVER w > 1 THEN true
ELSE false
END AS "resurrected"
FROM "login"
WINDOW w AS (PARTITION BY "user_id" ORDER BY "date")
This creates boolean values per user per day when they become active, when they churn and when they re-activate.
Then do a daily aggregate of the same:
CREATE OR REPLACE VIEW "vw_activity" AS
SELECT
SUM("activated"::int) "activated"
, SUM("churned"::int) "churned"
, SUM("resurrected"::int) "resurrected"
, "date"
FROM "vw_login"
GROUP BY "date"
;
And finally calculate running totals of active MAUs by calculating the cumulative sums over the columns. You need to join the vw_activity twice, since the second one is joined to the day when the user becomes inactive (i.e. 30 days since their last login).
I've included a date series in order to ensure that all days are present in your dataset. You can do without it too, but you might skip days in your dataset.
SELECT
d."date"
, SUM(COALESCE(a.activated::int,0)
- COALESCE(a2.churned::int,0)
+ COALESCE(a.resurrected::int,0)) OVER w
, d."date", a."activated", a2."churned", a."resurrected" FROM
generate_series('2010-01-01'::date, CURRENT_DATE, '1 day'::interval) d
LEFT OUTER JOIN vw_activity a ON d."date" = a."date"
LEFT OUTER JOIN vw_activity a2 ON d."date" = (a2."date" + INTERVAL '30 days')::date
WINDOW w AS (ORDER BY d."date") ORDER BY d."date";
You can of course do this in a single query, but this helps understand the structure better.
Assuming you have values for each day, you can get the total counts using a subquery and range between
:
with dau as (
select date, count(userid) as dau
from dailysessions ds
group by date
)
select date, dau,
sum(dau) over (order by date rows between -29 preceding and current row) as mau
from dau;
Unfortunately, I think you want distinct users rather than just user counts. That makes the problem much more difficult, especially because Postgres doesn't support count(distinct)
as a window function.
I think you have to do some sort of self join for this. Here is one method:
with dau as (
select date, count(distinct userid) as dau
from dailysessions ds
group by date
)
select date, dau,
(select count(distinct user_id)
from dailysessions ds
where ds.date between date - 29 * interval '1 day' and date
) as mau
from dau;
This one uses COUNT DISTINCT to get the rolling 30 days DAU/MAU:
(calculating reddit's user engagement in BigQuery - but the SQL is standard enough to be used on other databases)
SELECT day, dau, mau, INTEGER(100*dau/mau) daumau
FROM (
SELECT day, EXACT_COUNT_DISTINCT(author) dau, FIRST(mau) mau
FROM (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) day, author
FROM [fh-bigquery:reddit_comments.2015_09]
WHERE subreddit='AskReddit') a
JOIN (
SELECT stopday, EXACT_COUNT_DISTINCT(author) mau
FROM (SELECT created_utc, subreddit, author FROM [fh-bigquery:reddit_comments.2015_09], [fh-bigquery:reddit_comments.2015_08]) a
CROSS JOIN (
SELECT DATE(SEC_TO_TIMESTAMP(created_utc)) stopday
FROM [fh-bigquery:reddit_comments.2015_09]
GROUP BY 1
) b
WHERE subreddit='AskReddit'
AND SEC_TO_TIMESTAMP(created_utc) BETWEEN DATE_ADD(stopday, -30, 'day') AND TIMESTAMP(stopday)
GROUP BY 1
) b
ON a.day=b.stopday
GROUP BY 1
)
ORDER BY 1
I went further at How to calculate DAU/MAU with BigQuery (engagement)