Count number of rows that are not within 10 seconds of each other

后端 未结 8 617
情歌与酒
情歌与酒 2021-02-01 09:26

I track web visitors. I store the IP address as well as the timestamp of the visit.

ip_address    time_stamp
180.2.79.3  1301654105
180.2.79.3  1301654106
180.2.         


        
相关标签:
8条回答
  • 2021-02-01 09:46

    For giggles sakes, here is an UPDATE hack that accomplishes what you need. There are a myriad of reasons not to implement this, including but not limited to the fact that it may simply stop working some day. Anyway, assuming you have your table initially ordered by ip -> timestamp, this should (usually) give you the correct answers. Again, this is for completeness, if you implement this, look up the risks beforehand.

    CREATE TABLE #TestIPs
    (
        ip_address varchar(max),
        time_stamp decimal(12,0),
        cnt int
        )
    
    INSERT INTO #TestIPs (ip_address, time_stamp)
    SELECT '180.2.79.3',  1301654105 UNION ALL
    SELECT '180.2.79.3',  1301654106 UNION ALL
    SELECT '180.2.79.3',  1301654354 UNION ALL
    SELECT '180.2.79.3',  1301654356 UNION ALL
    SELECT '180.2.79.3',  1301654358 UNION ALL
    SELECT '180.2.79.3',  1301654366 UNION ALL
    SELECT '180.2.79.3',  1301654368 UNION ALL
    SELECT '180.2.79.3',  1301654422 UNION ALL
    SELECT '180.2.79.4',  1301654105 UNION ALL
    SELECT '180.2.79.4',  1301654106 UNION ALL
    SELECT '180.2.79.4',  1301654354 UNION ALL
    SELECT '180.2.79.4',  1301654356 UNION ALL
    SELECT '180.2.79.4',  1301654358 UNION ALL
    SELECT '180.2.79.4',  1301654366 UNION ALL
    SELECT '180.2.79.4',  1301654368 UNION ALL
    SELECT '180.2.79.4',  1301654422
    
    DECLARE @count int; SET @count = 0
    DECLARE @ip varchar(max); SET @ip = 'z'
    DECLARE @timestamp decimal(12,0); SET @timestamp = 0;
    
    UPDATE #TestIPs
        SET @count = cnt = CASE WHEN time_stamp - @timestamp > 10 THEN @count + 1 ELSE CASE WHEN @ip <> ip_address THEN 1 ELSE @count END END,      
            @timestamp = time_stamp,
            @ip = ip_address
    
    
            SELECT ip_address, MAX(cnt) AS 'Visits' FROM #TestIPs GROUP BY ip_address
    

    Results:

    ip_address  Visits
    ------------ -----------
    180.2.79.3  3
    180.2.79.4  3
    
    0 讨论(0)
  • 2021-02-01 09:48

    As usual with SQL there are many solution for your problem. I would use following query which is simple and should be "good enough":

    SELECT COUNT(*) AS tracks 
    FROM (
        SELECT ip_address 
        FROM tracking 
        GROUP BY ip_address, FLOOR(time_stamp / 10)
    )
    

    The sub query groups visits of a single user in 10s intervals so that they are counted as one visit.

    Of cause it is possible to find cases in which two visits will appear in different 10s window even though the interval between this visits will be less than 10s. It would require much more complex logic to eliminate such cases and the analytical value of this added complexity would be dubious (10s interval sounds like an arbitrary value anyway).

    0 讨论(0)
  • 2021-02-01 09:51

    The simplest way to do this is to divide the timestamps by 10, and count the distinct combinations of those values and the ip_address values. That way each 10 second period is counted separately.

    If you run this on your sample data it will give you 4 tracks, which is what you want I think.

    Give it a try and see if it gives you the desired results on your full data set:

    SELECT COUNT(DISTINCT ip_address, FLOOR(time_stamp/10)) AS tracks 
    FROM tracking
    
    0 讨论(0)
  • 2021-02-01 09:52

    The following logic will only count a visit as a 'unique visit' if there wasn't a preceding record from the same ip address within the preceding 10 seconds.

    This means that {1,11,21,32,42,52,62,72} will count as 2 visits, with 3 and 5 tracks each, respectively.

    It accomplishes this by first identifying the unique visits. Then it counts all visits that happened between that unique visit and the next unique visit.

    WITH
        unique_visits
    (
      SELECT
        ip_address, time_stamp
      FROM
        visitors
      WHERE
        NOT EXISTS (SELECT * FROM visitors AS [previous]
                    WHERE ip_address  = visitors.ip_address
                      AND time_stamp >= visitors.timestamp - 10
                      AND time_stamp <  visitors.timestamp)
    )
    SELECT
      unique_visitors.ip_address,
      unique_visitors.time_stamp,
      COUNT(*) AS [total_tracks]
    FROM
      unique_visitors
    INNER JOIN
      visitors
        ON  visitors.ip_address  = unique_visitors.ip_address
        AND visitors.time_stamp >= unique_visitors.time_stamp
        AND visitors.time_stamp <  ISNULL(
                                      (SELECT MIN(time_stamp) FROM unique_visitors [next]
                                       WHERE  ip_address = unique_visitors.ip_address
                                       AND    time_stamp > unique_visitors.ip_address)
                                      , visitors.time_stamp + 1
                                   )
    

    You will also want either an index or primary key on (ip_address, time_stamp)

    0 讨论(0)
  • 2021-02-01 09:53
    Select Z.IP, Count(*) As VisitCount
    From    (
            Select V.IP
            From visitors As V
                Left Join visitors As V2
                    On V2.IP = V.IP
                        And V2.time_stamp > V.time_stamp
            Group By V.IP, V.time_stamp
            Having (Min(V2.time_stamp) - V.time_stamp) >= 10
            ) As Z
    Group By Z.IP
    

    This counts any visit where the next entry is more than 10 seconds away as a new visit.

    0 讨论(0)
  • 2021-02-01 09:57

    Make a left join against the records with the same ip and a close time, and filter out the records where there is a match:

    select count(*) as visits
    from (
      select t.ip_address
      from tracking t
      left join tracking t2
        on t2.ip_address = t.ip_address
        and t2.timestamp > t.timestamp and t2.timestamp <= t.timestamp + 10
      where t2.ip_address is null
    ) x
    
    0 讨论(0)
提交回复
热议问题