I\'m trying analyze user retention using a cohort analysis based on event data stored in Redshift.
For example, in Redshift I have:
timestamp
Eventually I found the query below to satisfy my requirements.
WITH
users AS (
SELECT
user_id,
date_trunc('day', min(timestamp)) as activated_at
from table
group by 1
)
,
events AS (
SELECT user_id,
action,
timestamp AS occurred_at
FROM table
)
SELECT DATE_TRUNC('day',u.activated_at) AS signup_date,
TRUNC(EXTRACT('EPOCH' FROM e.occurred_at - u.activated_At)/(3600*24)) AS user_period,
COUNT(DISTINCT e.user_id) AS retained_users
FROM users u
JOIN events e
ON e.user_id = u.user_id
AND e.occurred_at >= u.activated_at
WHERE u.activated_at >= getdate() - INTERVAL '11 day'
GROUP BY 1,2
ORDER BY 1,2
It produces a slightly different table than I described above (but is better for my needs):
signup_date user_period retained_users
----------- ----------- --------------
2015-05-05 0 80
2015-05-05 1 60
2015-05-05 2 40
2015-05-05 3 20
2015-05-06 0 100
2015-05-06 1 80
2015-05-06 2 40
2015-05-06 3 20