How can I create a weekly cohort analysis table using mysql?

前端 未结 1 687
感动是毒
感动是毒 2021-01-15 07:36

Let\'s say you have a user table that has at least the date the user signed up and an id.

Now let\'s say you have a separate table that tracks an action like a pay

相关标签:
1条回答
  • 2021-01-15 08:27

    This query is modified from the one I wrote here: Cohort analysis in SQL

    Here's the final query:

    SELECT
      STR_TO_DATE(CONCAT(tb.cohort, ' Monday'), '%X-%V %W') as date,
      size,
      w1,
      w2,
      w3,
      w4,
      w5,
      w6,
      w7
    FROM (
      SELECT u.cohort, 
        IFNULL(SUM(s.Offset = 0), 0) w1,
        IFNULL(SUM(s.Offset = 1), 0) w2,
        IFNULL(SUM(s.Offset = 2), 0) w3,
        IFNULL(SUM(s.Offset = 3), 0) w4,
        IFNULL(SUM(s.Offset = 4), 0) w5,
        IFNULL(SUM(s.Offset = 5), 0) w6,
        IFNULL(SUM(s.Offset = 6), 0) w7
      FROM (
       SELECT
          UserId,
          DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
        FROM users
      ) as u
      LEFT JOIN (
          SELECT DISTINCT
          payments.UserId,
          FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
          FROM payments
          LEFT JOIN users ON (users.UserId = payments.UserId)
      ) as s ON s.UserId = u.UserId
      GROUP BY u.cohort
    ) as tb
    LEFT JOIN (
      SELECT DATE_FORMAT(AddedDate, "%Y-%u") dt, COUNT(*) size FROM users GROUP BY dt
    ) size ON tb.cohort = size.dt
    

    So the core of this is we grab the users and the date they signed up and format the date by year-week number, since we are doing a weekly cohort.

    SELECT
      UserId,
      DATE_FORMAT(AddedDate, "%Y-%u") AS cohort
    FROM users
    

    Since we want to group by the cohort we have to put this in a subquery in the FROM part of the query.

    Then we want join the payment information on the users.

    SELECT DISTINCT
      payments.UserId,
      FLOOR(DATEDIFF(payments.PaymentDate, users.AddedDate)/7) AS Offset
      FROM payments
      LEFT JOIN users ON (users.UserId = payments.UserId)
    

    This will get unique weekly payment events per user by the numbers of weeks they have been a user. We use distinct because if a user made 2 purchase in one week, we don't want to count that as two users.

    We don't just use the payments table, because some users may sign up and not have payments. So we select from the users table and join on the payments table.

    You then group by the week - u.cohort. Then you aggregate on the week numbers to find out how many people made payments the weeks after they signed up.

    The version of mysql I used had sql_mode set to only_full_group_by. So to get the cohort size I put the bulk of the query in subquery so I could join on the users to get the size of the cohort.

    Further considerations:

    Filter by weeks is simple. tb.cohort > start date and tb.cohort < end date where start and end date are formatted with "%Y-%u". To make the query more efficient you'll probably want to filter out payment events that don't fall within the date range as well so you're not joining on data you don't need.

    You may want to consider using a calender table to cover cases where there are no user sign ups during the week.

    Here's a fiddle with everything working: http://sqlfiddle.com/#!9/172dbe/1

    0 讨论(0)
提交回复
热议问题