Find number of concurrent users in a SQL records

后端 未结 7 1019
生来不讨喜
生来不讨喜 2021-01-31 21:54

I have the table of following structure:

UserID   StartedOn          EndedOn
1        2009-7-12T14:01    2009-7-12T15:01 
2        2009-7-12T14:30    2009-7-12T1         


        
相关标签:
7条回答
  • 2021-01-31 22:20

    You can order all events on date order and compute a running aggregate of current users logged in:

    DECLARE @Table TABLE 
    (
      UserId int, 
      StartedOn datetime,
      EndedOn datetime
    );
    
    insert into @table (UserId, startedOn, EndedOn)
    select 1, '2009-7-12 14:01', '2009-7-12 15:01'
    union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
    union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
    union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
    union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
    union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
    union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';
    
    with cte_all_events as (
    select StartedOn as Date
        , +1 as Users
        from @Table
    union all 
    select EndedOn as Date
        , -1 as Users
        from @Table),
    cte_ordered_events as (
    select Date
        , Users
        , row_number() over (order by Date asc) as EventId
        from cte_all_events)
    , cte_agg_users as (
      select Date
        , Users
        , EventId
        , (select sum(Users) 
            from cte_ordered_events agg
            where agg.EventId <= e.EventId) as AggUsers
        from cte_ordered_events e)
    select * from cte_agg_users
    
    
    2009-07-12 11:01:00.000 1   1   1
    2009-07-12 13:01:00.000 1   2   2
    2009-07-12 14:01:00.000 1   3   3
    2009-07-12 14:15:00.000 1   4   4
    2009-07-12 14:30:00.000 1   5   5
    2009-07-12 14:45:00.000 -1  6   4
    2009-07-12 14:47:00.000 1   7   5
    2009-07-12 15:01:00.000 -1  8   4
    2009-07-12 15:30:00.000 -1  9   3
    2009-07-12 16:07:00.000 1   10  4
    2009-07-12 17:01:00.000 -1  11  3
    2009-07-12 18:01:00.000 -1  12  2
    2009-07-12 19:01:00.000 -1  13  1
    2009-07-12 19:01:00.000 -1  14  0
    

    Once you have this in place, finding the number of maximum concurrent sessions is trivial. As you see you have two moments when you had 5 users, at 14:30 (when user 2 logged in) and at 14:47 (when user 3 logged in). Just replace the last query that selects from the CTE to get the actual max:

    select top(1) AggUsers 
        from cte_agg_users
        order by AggUsers desc
    

    This solution uses CTEs so it will only work on SQL 2k5, if you're still on SQL 2000 you'll have to rewrite it using derived tables instead of CTEs.

    0 讨论(0)
  • 2021-01-31 22:24

    This is NOT a solution. Since, at the time of this posting, the most upvoted solution has a really nasty CROSS JOIN for smaller numbers of rows and a really nasty TRIANGULAR JOIN for larger numbers of rows, I'd thought I'd post some code to make a more substantial amount of test data for people to do their testing with. Let the races begin. ;-)

    DROP TABLE #Table
    GO
    WITH
    cteStartedOn AS
    (
     SELECT TOP 100000 --LOOK!  Change this number to vary the number of rows you're testing with.
            UserID = ABS(CHECKSUM(NEWID()))%1000,
            StartedOn = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2012','2013')+CAST('2012' AS DATETIME)
       FROM sys.all_columns ac1, sys.all_columns ac2
    )
     SELECT UserID, StartedOn,
            EndedOn = DATEADD(ss,ABS(CHECKSUM(NEWID()))%36000,StartedOn) --10 hours max
       INTO #Table
       FROM cteStartedOn;
    
    0 讨论(0)
  • 2021-01-31 22:31

    I did the work using integers rather than datetime fields, but I believe the following sql snippet gets you what you want.

    Basically, I compared the start and end date of each user against each other using a self-join. If User A started before or at the same time as User B AND User B started before or at the same time as User A ended, they are running concurrently. Thus, I found the user with the max number of concurrent users (and added 1 for themselves since I excluded them in the self-join.)

    I noticed you have multiple rows for each user. Please note the sql below assumes the same user can't be running multiple instances at once (concurrently.) If this assumption doesn't hold true, I'm hoping you have an additional column which is unique per row. Use this column rather than UserId throughout the sql routine.

    I've gotten you really close. I hope this helps. Best of luck.

    DECLARE @Table TABLE 
    (
      UserId int, 
      StartedOn int,
      EndedOn int
    )
    
    Insert Into @Table
    Select 1, 1, 3
    union
    Select 2, 2, 4
    union
    Select 3, 3, 5
    union
    Select 4, 4, 6
    union
    Select 5, 7, 8
    union
    Select 6, 9, 10
    union
    Select 7, 9, 11
    union
    Select 8, 9, 12
    union
    Select 9, 10, 12
    union
    Select 10, 10, 13
    
    --Select * from @Table
    
    Select 
        A.UserId, 
        Count(B.UserId) + 1 as 'Concurrent Users'
    FROM @Table A, @Table B
    WHERE A.StartedOn <= B.StartedOn
    AND B.StartedOn <= A.EndedOn
    AND A.UserId != B.UserId
    Group By A.UserId
    Order By Count(B.UserId) Desc
    
    0 讨论(0)
  • 2021-01-31 22:34

    A naive approach:
    You can test if another user b is currently logged in when user a logs in with

    a.StartedOn BETWEEN b.StartedOn AND b.EndedOn
    

    And someone has to be the "final logon" to the set of "the most concurrent users".
    If you now go through all records (as a) and check how many other users (b) where logged in at the time and then order the list (desc) the first result is the maximum number of concurrent users.

    SELECT
      a.id, a.UserId, a.StartedOn, a.EndedOn,  
      (  
        SELECT    
          Count(*)      
        FROM    
          logons as b      
        WHERE    
          a.StartedOn BETWEEN b.StartedOn AND b.EndedOn            
      ) as c
    FROM
      logons as a 
    ORDER BY
      c desc
    

    And now read Database development mistakes made by application developers to see how inefficient (or even wrong) this is ;-)
    e.g. you have a large temporary table that the order by operates on without any index to help the sql server.

    (and btw: I tested this with MySQL because I don't have a sql server at hand right now)

    0 讨论(0)
  • 2021-01-31 22:39

    I tried AlexKuznetsov's solution but the result was 49 :(

    My solution:

    /* Create temporary table and set all dates into 1 column,
    so we can sort by this one column */
    DECLARE @tmp table (
        Dates datetime,
        IsStartedDate bit )
    
    INSERT INTO @tmp
        SELECT StartedOn, 1 FROM stats
        UNION ALL
        SELECT EndedOn, 0 FROM stats
    
    DECLARE @currentlogins int, @highestlogins int, @IsStartedDate bit;
    SET @currentlogins = 0;
    SET @highestlogins = 0;
    
    DECLARE tmp_cursor CURSOR FOR 
    SELECT IsStartedDate FROM @tmp
    ORDER BY Dates ASC
    
    OPEN tmp_cursor
    
    /* Step through every row, if it's a starteddate increment @currentlogins else decrement it
    When @currentlogins is higher than @highestlogins set @highestlogins to the new highest value */
    FETCH NEXT FROM tmp_cursor 
    INTO @IsStartedDate
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
        IF (@IsStartedDate = 1)
        BEGIN
            SET @currentlogins = @currentlogins + 1;
            IF (@currentlogins > @highestlogins)
                SET @highestlogins = @currentlogins;
        END
        ELSE
            SET @currentlogins = @currentlogins - 1;
    
        FETCH NEXT FROM tmp_cursor 
        INTO @IsStartedDate
    END
    
    CLOSE tmp_cursor
    DEALLOCATE tmp_cursor
    
    SELECT @highestlogins AS HighestLogins
    
    0 讨论(0)
  • 2021-01-31 22:45

    Clearly the number of concurrent users only changes when a user either starts or ends a period, so it is enough to determine the number of concurrent users during starts and ends. So, reusing test data provided by Remus (thank you Remus):

    DECLARE @Table TABLE 
    (
      UserId int, 
      StartedOn datetime,
      EndedOn datetime
    );
    
    insert into @table (UserId, startedOn, EndedOn)
    select 1, '2009-7-12 14:01', '2009-7-12 15:01'
    union all select 2, '2009-7-12 14:30', '2009-7-12 14:45'
    union all select 3, '2009-7-12 14:47', '2009-7-12 15:30'
    union all select 4, '2009-7-12 13:01', '2009-7-12 17:01'
    union all select 5, '2009-7-12 14:15', '2009-7-12 18:01'
    union all select 6, '2009-7-12 11:01', '2009-7-12 19:01'
    union all select 1, '2009-7-12 16:07', '2009-7-12 19:01';
    
    SELECT MAX(ConcurrentUsers) FROM(
    SELECT COUNT(*) AS ConcurrentUsers FROM @table AS Sessions 
    JOIN 
    (SELECT DISTINCT StartedOn AS ChangeTime FROM @table
    ) AS ChangeTimes
    ON ChangeTime >= StartedOn AND ChangeTime < EndedOn 
    GROUP BY ChangeTime
    ) AS ConcurrencyAtChangeTimes
    -------
    5
    

    BTW using DISTINCT per se is not a mistake - only abusing DISTINCT is. DISTINCT is just a tool, using it in this context is perfectly correct.

    Edit: I was answering the OP's question: "how one could calculate this using T-SQL only". Note that the question does not mention performance.

    If the questions was this: "what is the fastest way to determine maximum concurrency if the data is stored in SQL Server", I would provide a different answer, something like this:

    Consider the following alternatives

    1. Write a cursor
    2. Write a CLR cursor
    3. Write a loop on the client
    4. Use an RDBMS with decent cursors, such as Oracle or PostgreSql
    5. For top performance, design your table differently, so that you can retrieve the answer in one index seek. This is what I do in my system if I need to deliver best possible performance.

    If the question was "what is the fastest way to determine maximum concurrency using a T-SQL query", I would probably not answer at all. The reason: if I needed really good performance, I would not solve this problem in a T-SQL query.

    0 讨论(0)
提交回复
热议问题