Count number of rows that are not within 10 seconds of each other

后端 未结 8 609
情歌与酒
情歌与酒 2021-02-01 09:26

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.         


        
相关标签:
8条回答
  • 2021-02-01 10:02

    Let me start with this table. I'll use ordinary timestamps so we can easily see what's going on.

    180.2.79.3   2011-01-01 08:00:00
    180.2.79.3   2011-01-01 08:00:09
    180.2.79.3   2011-01-01 08:00:20
    180.2.79.3   2011-01-01 08:00:23
    180.2.79.3   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:40
    180.2.79.4   2011-01-01 08:00:00
    180.2.79.4   2011-01-01 08:00:13
    180.2.79.4   2011-01-01 08:00:23
    180.2.79.4   2011-01-01 08:00:25
    180.2.79.4   2011-01-01 08:00:27
    180.2.79.4   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:50
    

    If I understand you correctly, you want to count these like this.

    180.2.79.3   3
    180.2.79.4   3
    

    You can do that for each ip_address by selecting the maximum timestamp that is both

    • greater than the current row's timestamp, and
    • less than or equal to 10 seconds greater than the current row's timestamp.

    Taking these two criteria together will introduce some nulls, which turn out to be really useful.

    select ip_address, 
           t_s.time_stamp, 
           (select max(t.time_stamp) 
            from t_s t 
            where t.ip_address = t_s.ip_address 
              and t.time_stamp > t_s.time_stamp
              and t.time_stamp - t_s.time_stamp <= interval '10' second) next_page
    from t_s 
    group by ip_address, t_s.time_stamp
    order by ip_address, t_s.time_stamp;
    
    ip_address   time_stamp            next_page
    180.2.79.3   2011-01-01 08:00:00   2011-01-01 08:00:09
    180.2.79.3   2011-01-01 08:00:09   <null>
    180.2.79.3   2011-01-01 08:00:20   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:23   2011-01-01 08:00:25
    180.2.79.3   2011-01-01 08:00:25   <null>
    180.2.79.3   2011-01-01 08:00:40   <null>
    180.2.79.4   2011-01-01 08:00:00   <null>
    180.2.79.4   2011-01-01 08:00:13   2011-01-01 08:00:23
    180.2.79.4   2011-01-01 08:00:23   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:25   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:27   2011-01-01 08:00:29
    180.2.79.4   2011-01-01 08:00:29   <null>
    180.2.79.4   2011-01-01 08:00:50   <null>
    

    The timestamp that marks the end of a visit has a null for its own next_page. That's because no timestamp is less than or equal to time_stamp + 10 seconds for that row.

    To get a count, I'd probably create a view and count the nulls.

    select ip_address, count(*)
    from t_s_visits 
    where next_page is null
    group by ip_address
    
    180.2.79.3   3
    180.2.79.4   3
    
    0 讨论(0)
  • 2021-02-01 10:03

    You could JOIN the tracking table to itself and filter out the records you don't need by adding a WHEREclause.

    SELECT  t1.ip_address
            , COUNT(*) AS tracks
    FROM    tracking t1
            LEFT OUTER JOIN tracking t2 ON t2.ip_address = t1.ip_address
                                           AND t2.time_stamp < t1.time_stamp + 10
    WHERE   t2.ip_adress IS NULL
    GROUP BY
            t1.ip_address
    

    Edit

    Following script works in SQL Server but I can't express it in a single SQL statement, let alone convert it to MySQL. It might give you some pointers on what is needed though.

    Note: I assume for given inputs, number 1 and 11 should get chosen.

    ;WITH q (number) AS (
      SELECT 1
      UNION ALL SELECT 2
      UNION ALL SELECT 10
      UNION ALL SELECT 11  
      UNION ALL SELECT 12
    )
    SELECT  q1.Number as n1
            , q2.Number as n2
            , 0 as Done
    INTO    #Temp
    FROM    q q1
            LEFT OUTER JOIN q q2 ON q2.number < q1.number + 10
                                    AND q2.number > q1.number
    
    DECLARE @n1 INTEGER
    DECLARE @n2 INTEGER
    
    WHILE EXISTS (SELECT * FROM #Temp WHERE Done = 0)
    BEGIN
    
      SELECT  TOP 1 @n1 = n1
              , @n2= n2
      FROM    #Temp
      WHERE   Done = 0
    
      DELETE  FROM #Temp
      WHERE   n1 = @n2
    
      UPDATE  #Temp 
      SET     Done = 1
      WHERE   n1 = @n1 
              AND n2 = @n2         
    END        
    
    SELECT  DISTINCT n1 
    FROM    #Temp
    
    DROP TABLE #Temp
    
    0 讨论(0)
提交回复
热议问题