Count number of consecutive visits

后端 未结 2 1156
一生所求
一生所求 2021-01-07 10:14

Every time a logged in user visits the website their data is put into a table containing the userId and date (either one or zero row per user per day):

   44         


        
2条回答
  •  攒了一身酷
    2021-01-07 10:50

    I missed the mysql tag and wrote up this solution. Sadly, this does not work in MySQL as it does not support window functions.

    I post it anyway, as I put some effort into it. Tested with PostgreSQL. Would work similarly with Oracle or SQL Server (or any other decent RDBMS that supports window functions).

    Test setup

    CREATE TEMP TABLE v(id int, visit date);
    INSERT INTO v VALUES
     (444631, '2011-11-07')
    ,(444631, '2011-11-06')
    ,(444631, '2011-11-05')
    ,(444631, '2011-11-04')
    ,(444631, '2011-11-02')
    ,(444631, '2011-11-01')
    ,(444632, '2011-12-02')
    ,(444632, '2011-12-03')
    ,(444632, '2011-12-05');
    

    Simple version

    -- add 1 to "difference" to get number of days of the longest period
    SELECT id, max(dur) + 1 as max_consecutive_days
    FROM (
    
       -- calculate date difference of min and max in the group
       SELECT id, grp, max(visit) - min(visit) as dur
       FROM (
    
          -- consecutive days end up in a group
          SELECT *, sum(step) OVER (ORDER BY id, rn) AS grp
          FROM   (
    
             -- step up at the start of a new group of days
             SELECT id
                   ,row_number() OVER w AS rn
                   ,visit
                   ,CASE WHEN COALESCE(visit - lag(visit) OVER w, 1) = 1
                    THEN 0 ELSE 1 END AS step
             FROM   v
             WINDOW w AS (PARTITION BY id ORDER BY visit)
             ORDER  BY 1,2
             ) x
          ) y
          GROUP BY 1,2
       ) z
    GROUP  BY 1
    ORDER  BY 1
    LIMIT  1;
    

    Output:

       id   | max_consecutive_days
    --------+----------------------
     444631 |                    4
    

    Faster / Shorter

    I later found an even better way. grp numbers are not continuous (but continuously rising). Doesn't matter, since those are just a mean to an end:

    SELECT id, max(dur) + 1 AS max_consecutive_days
    FROM (
        SELECT id, grp, max(visit) - min(visit) AS dur
        FROM (
          -- subtract an integer representing the number of day from the row_number()
          -- creates a "group number" (grp) for consecutive days
          SELECT id
                ,EXTRACT(epoch from visit)::int / 86400
               - row_number() OVER (PARTITION BY id ORDER BY visit) AS grp
                ,visit
          FROM   v
          ORDER  BY 1,2
          ) x
        GROUP BY 1,2
        ) y
    GROUP  BY 1
    ORDER  BY 1
    LIMIT  1;
    

    SQL Fiddle for both.

    More

    • A procedural solution for a similar problem.
      You might be able to implement something similar in MySQL.
    • Closely related answers on dba.SE with extensive explanation here and here.
    • And on SO:
      GROUP BY and aggregate sequential numeric values

提交回复
热议问题