How can I check for average concurrent events in a SQL table based on the date, time and duration of the events?

后端 未结 6 1041
滥情空心
滥情空心 2021-02-05 22:48

I have a set of call detail records, and from those records, I\'m supposed to determine the average concurrent active calls per system, per hour (at a precision of one minute).

相关标签:
6条回答
  • 2021-02-05 23:23

    My first bit of advice would be, if you ever find yourself saying (when using SQL) "I can create a loop..." then you should immediately start looking for a set-based approach. Get out of the procedural mindset when using SQL.

    There are still a few fuzzy parts to your logic. Does a call count as being during a minute period if it simply has any portion of the call during that minute? For example, if a call starts at 1923 and lasts 62 seconds is it considered to overlap with all calls starting at 1924? I'm going to assume yes on that one, but you can adjust the code below if that's not true. It should be a minor tweak.

    For the breakdown to minutes I would usually use a table with time spans - a start time and end time for each of the slices in which I'm interested. In your case since you are dealing with exact minutes and your start times are in minutes (even if the data type is screwed up) I'm just going to use a single column with the exact minute in question.

    To set up that table:

    CREATE TABLE dbo.Minutes (
        start_time  INT NOT NULL,
        CONSTRAINT PK_Minutes PRIMARY KEY CLUSTERED (start_time)
    )
    
    DECLARE
        @hour   TINYINT,
        @minute TINYINT
    
    SET @hour = 19
    SET @minute = 0
    
    WHILE (@hour <= 20)
    BEGIN
        INSERT INTO dbo.Minutes (start_time) VALUES (@hour * 100 + @minute)
    
        SET @minute = @minute + 1
        IF @minute = 60
        BEGIN
            SET @minute = 0
            SET @hour = @hour + 1
        END
    END
    

    Now we can select for the averages, etc.

    SELECT
        M.start_time,
        COUNT(R.seconds)
    FROM
        dbo.Minutes M
    LEFT OUTER JOIN dbo.Records R ON
        M.start_time BETWEEN CAST(R.time AS INT) AND
            (CAST(SUBSTRING(time, 1, 2) AS INT) * 100) +    -- hours
            (FLOOR((CAST(SUBSTRING(time, 3, 2) AS INT) + FLOOR(dur/60))/60)) +  -- carryover to hours
            (CAST(SUBSTRING(time, 3, 2) AS INT) + dur/60) % 60  -- minutes
    GROUP BY
        M.start_time
    

    You'll need to either use that as a subquery to get the averages over a given time. Since it's late on Friday I'll leave that step up to you ;)

    EDIT: One last caveat: I didn't account for time spans that cross day boundaries (i.e., go past midnight). Hopefully the code points you in the right direction for that. A better approach might be to create a view that turns all of those nasty strings into actual DATETIME values, then this becomes really trivial with the Minutes table.

    0 讨论(0)
  • 2021-02-05 23:25

    I think MarkusQ has the answer, but let me develop an alternative that you may find easier to use. I'll use my customary method of developing this as a series of simple transformations in views, an analogue of functional decomposition in a procedural language.

    First, let's put everything in common units. Recall that record's column s is seconds since the epoch, midnight 1 January 1970. We can find the number of seconds since midnight of the day of the call, that call occurred, by just taking s modulus the number of seconds in a day: s % (60 * 60 * 24).

    select *, 
    s % (60 * 60 * 24) as start_secs_from_midnight,
    s % (60 * 60 * 24) + dur - 1 as end_secs_from_midnight,
    ;
    

    We subtract one from s + dur because a one second call that starts at 12:00:00 also ends on 12:00:00.

    We can find minutes since midnight by dividing those results by 60, or just by floor( s / 60 ) % (60 * 24) :

    create view record_mins_from_midnight as
    select *, 
    floor( s / 60 ) % (60 * 24) as start_mins_fm,
    floor( ( s + dur - 1) / 60 ) % (60 * 24) as end_mins_fm 
    from record
    ;
    

    Now we create a table of minutes. We need 1440 of them, numbered from 0 to 1439. In databases that don't support arbitrary sequences, I create an artificial range or sequence like this:

      create table artificial_range ( 
       id int not null primary key auto_increment, idz int) ;
      insert into artificial_range(idz) values (0);
      -- repeat next line to double rows
      insert into artificial_range(idz) select idz from artificial_range;
    

    So to create a minute table:

      create view minute as 
       select id - 1 as active_minute 
       from artificial_range 
       where id <= 1440
       ;
    

    Now we just join minute to our record view

    create view record_active_minutes as
    select * from minutes a 
    join record_mins_from_midnight b
    on (a.active_minute >= b.start_mins_fm 
    and a.active_minute <= b.end_mins_fm 
     ;
    

    This just cross products/multiplies record rows, so we have one record row for each whole minute over which the call was active.

    Note that I'm doing this by defining active as "(part of) the call occurred during a minute". That is, a two second call that starts at 12:00:59 and ends at 12:01:01 by this definition occurs during two different minutes, but a two second call that starts at 12:00:58 and ends at 12:00:59 occurs during one minute.

    I did that because you specified "So, I need a way to check for a count of active calls for 7:00-7:01, 7:01-7:02". If you prefer to consider only calls lasting more than sixty seconds to occur in more than one minute, you'll need to adjust the join.

    Now if we want to find the number of active records for any granularity equal to or larger than minute granularity, we just group on that last view. To find average calls per hour we divide by 60 to turn minutes to hours:

     select floor( active_minute / 60 ) as hour, 
     count(*) / 60 as avg_concurent_calls_per_minute_for_hour
     from record_active_minutes
     group by floor( active_minute / 60 ) ;
    

    Note that that is the average per hour for all calls, over all days; if we want to limit it to a particular day or range of days, we'd add a where clause.


    But wait, there's more!

    If we create a version of record_active_minutes that does a left outer join, we can get a report that shows the average over all hours in the day:

     create view record_active_minutes_all as
     select * 
     from 
     minutes a 
     left outer join record_mins_from_midnight b
       on (a.active_minute >= b.start_mins_fm 
           and a.active_minute <= b.end_mins_fm) 
     ;
    

    Then we again do our select, but against the new view:

     select floor( active_minute / 60 ) as hour, 
     count(*) / 60 as avg_concurent_calls_per_min
     from record_active_minutes_all
     group by floor( active_minute / 60 ) ;
    
    
    +------+------------------------------+
    | hour | avg_concurrent_calls_per_min |
    +------+------------------------------+
    |    0 |                       0.0000 |
    |    1 |                       0.0000 |
    |    2 |                       0.0000 |
    |    3 |                       0.0000 |
       etc....
    

    We can also index into this with a where. Unfortunately, the join means we'll have null values for the underlying record table where no calls exist for a particular hour, e.g.,

     select floor( active_minute / 60 ) as hour, 
     count(*) / 60 as avg_concurent_calls_per_min
     from record_active_minutes_all
     where month(date) = 1 and year(date) = 2008 
     group by floor( active_minute / 60 ) ;
    

    will bring back no rows for hours in which no calls occurred. If we still want our "report-like" view that shows all hours, we make sure we also include those hours with no records:

     select floor( active_minute / 60 ) as hour, 
     count(*) / 60 as avg_concurent_calls_per_minute_for_hour
     from record_active_minutes_all
     where (month(date) = 1 and year(date) = 2008) 
     or date is null 
     group by floor( active_minute / 60 ) ;
    

    Note that in the last two examples, I'm using a SQL date (to which the functions month and year can be applied), not the char(4) date in your record table.

    Which brings up another point: both the date and time in your record table are superfluous and denormalized, as each can be derived from your column s. Leaving them in the table allows the possibility of inconsistent rows, in which date(s) <> date or time(s) <> time. I'd prefer to do it like this:

       create table record ( id int not null primary key, s, duration) ; 
    
       create view record_date as 
       select *, dateadd( ss, s, '1970-01-01') as call_date
       from record
      ;
    

    In the dateadd function, the ss is an enumerated type that tells the function to add seconds; s is the column in record.

    0 讨论(0)
  • 2021-02-05 23:26

    As MarkusQ stated, your definition of "concurrent" allows you to short cut the maths.

    • Call (A) starts at "12:00:59" and ends at "12:01:01"
    • Call (B) starts at "12:01:59" and ends at "12:02:01"
      => 1 call in the "12:00" interval
      => 2 calls in the "12:01" interval
      => 1 call in the "12:02" interval

    The average concurrent calls is then (1+2+1)/intervalCount

    The (1+2+1) can be calculated differently, and more quickly/easily:

    • Call (A) covers 2 different minute intervals (12:00 and 12:01)
    • Call (B) covers 2 different minute intervals (12:01 and 12:02)
      => total covered minutes = 4

    The important fact here (and why I bothered reply after MarkusQ posted) is that the duration of a call itself isn't enough to calculate how many minute intervals are covered. In my example, both calls only last 2 seconds...

    You need the following info:
    - the "start time", rounded down to the minute
    - the "end time", rounded down to the minute
    => covered intervals = number of minutes difference + 1

    To round the "time" field down to the minute I'd use this...

    DATEADD(minute, DATEDIFF(minute, 0, time), 0)
    

    So the number of covered minutes by a single call would be...

    DATEDIFF(
       minute,
       DATEADD(minute, DATEDIFF(minute, 0, time), 0),
       DATEADD(second, dur, time)
    ) + 1
    
    No need to round the "end time" down.
    Using DATEDIFF(minute) gives rounding down anyway.
    

    SUM that value for the range you're looking at, then divde by the number of minutes in that range, and you have your answer.

    If you're only looking for calls that are truely concurrent you can't use such tricks, but it's still possible (I've had to do something similar). But for your definition of concurrent, this should do it...

    DECLARE
       @date DATETIME, @start DATETIME, @end DATETIME
    SELECT
       @date = '2009 Jan 01', @start = '12:00', @end = '13:00'
    
    SELECT
       system,
       SUM(
           DATEDIFF(
              minute,
              CASE WHEN
                 CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME) < @start
              THEN
                 @start
              ELSE
                 CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME)
              END,
              CASE WHEN
                 DATEADD(second, dur, CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME)) > @end
              THEN
                 @end
              ELSE
                 DATEADD(second, dur, CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME))
              END
           ) + 1
       )
       /
       CAST(DATEDIFF(minute, @start, @end) AS FLOAT)
    FROM
       records
    WHERE
       date = @date
       AND CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME) >= @start
       AND DATEADD(second, dur, CAST(LEFT(time,2) + ':' + RIGHT(time,2) AS DATETIME)) < @end
    GROUP BY
       system
    


    This will deliberately not include the interval 13:00->13:01
    Only the 60 "1 minute long intervals" 12:00->12:01 through to 12:59->13:00


    EDIT:

    I just noticed that your times and dates are stored as strings, you'd need to convert those to DATETIMEs for my code to work.

    EDIT2:

    Bug corrected. If a call started at "11:59:01" and ended at "12:00:01", the "11:59" interval should not be counted. CASE statements added to compensate.

    Various Layout Edits

    0 讨论(0)
  • 2021-02-05 23:41

    I can see only one approach that extracts the data as specified from the call records:

    Create a list of events, where event is defined as the beginning of a call or the end of a call. (Thus each call record will generate two events.) Each event item should contain: system, datetime, and the boolean begin/end. Datetime should be rounded down to the nearest minute.

    Sort this list by (system, datetime) and scan it. For each call begin, increment CURCNT by one. For each call end, decrement CURCNT by one.

    If the datetime value is different from the previous record, add CURCNT to HOURSUM. If the datetime value indicates the start of a new hour, divide HOURSUM by 60, write a new result record (system, date, hour, average), and reset HOURSUM to zero.

    It should be obvious when to initialize CURCNT and HOURSUM, what to do when the system value is different from the previous record, etc.

    -Al.

    0 讨论(0)
  • 2021-02-05 23:42

    If I understand you correctly, you want to get a count of all records for which the start time is less then t+60 seconds and the start time plus the duration is less than or equal to t, for each t in the interval of interest (e.g., t=7:00, 7:01, 7:02...etc.).

    Then it's just a matter of averaging these counts.

    But what is an average? It's just the sum divided by the number of items, right? In this case, the number of items will always be equal to the time range in minutes, and the sum will be equal to the sum of the durations-minutes that fall within the interval, which you can compute in one go off the data given.

    Sound less impossible now? In pseudo SQL:

    select sum( 
         ((time+duration rounded up to next minute, capped at end of period)
        - (time rounded down, bottom-capped at start of period) - 1)
         /(1 minute) )
      from Records
      where date is right
    

    Then just divide that by the number of minutes in the period of interest.

    0 讨论(0)
  • 2021-02-05 23:44

    I approached the problem by converting the data into an easier format. I created a table where each row represents one minute of a call. Once you have that the average per minute by hour is simple. There are multiple selects in there to show the intermediate results. As long as the time range being queried and the durations aren't extremely large, it should be ok...?

    CREATE TABLE #Records(
      seconds char(10),
      [time] char(4),
      date char(8),
      dur int,
      system int,
      port int
    )
    
    /*
    seconds is an s[time] value. It's the difference of seconds from UTC 1/1/1970 00:00:00 to the current UTC [time], we use it as an identifier (like epoch).
    [time] is the [time] the call was made.
    date is the day the call was made.
    dur is the duration of the call in seconds.
    system is the system number.
    port is the port on the system (not particularly relevant for this question).
    */
    
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924228','1923','20090416',105,2,2)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239923455','1910','20090416',884,1,97)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924221','1923','20090416',116,2,15)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924259','1924','20090416',90,1,102)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239923458','1910','20090416',891,2,1)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924255','1924','20090416',99,2,42)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924336','1925','20090416',20,2,58)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924293','1924','20090416',64,2,41)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239923472','1911','20090416',888,2,27)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924347','1925','20090416',25,1,100)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924301','1925','20090416',77,2,55)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924332','1925','20090416',52,2,43)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924240','1924','20090416',151,1,17)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924313','1925','20090416',96,2,62)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924094','1921','20090416',315,2,16)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239923643','1914','20090416',788,2,34)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924447','1927','20090416',6,2,27)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924342','1925','20090416',119,2,15)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924397','1926','20090416',76,2,41)
    INSERT INTO #Records(seconds, [time], date, dur, system, port) VALUES('1239924457','1927','20090416',23,2,27)
    
    /* convert date + [time] into datetimes */
    select 
        seconds,
        system,
        cast(date + ' ' + left([time], 2) + ':' + right([time], 2) as datetime) as start_date,
        /* end date to the minute */
        dateadd(mi, datediff(mi, 0, dateadd(s, dur, cast(date + ' ' + left([time], 2) + ':' + right([time], 2) as datetime))), 0) as end_date
    into 
        #r
    from
        #Records
    
    select * from #r order by system, seconds, start_date, end_date;
    
    /* create a row for each minute of each call */
    create table #r_min(rnd int, seconds char(10), system int, minute datetime)
    
    declare @maxrnd int;
    select @maxrnd = max(datediff(mi, r.start_date, r.end_date)) from #r r
    declare @i int;
    set @i = 0;
    
    while @i < @maxrnd begin
    
        insert into #r_min
        select @i, r.seconds, r.system, dateadd(mi, @i, r.start_date)
        from #r r
        where dateadd(mi, @i, r.start_date) <= r.end_date
    
    set @i = @i + 1
    end
    
    select * from #r_min order by system, seconds, minute
    
    /* concurrent per minute */
    select  
        system, minute, count(*) as cnt
    from 
        #r_min 
    group by
        system, minute
    order by 
        system, minute
    
    /* avg concurrent per minute by hour */
    select
        m.system,
        dateadd(hh, datediff(hh, 0, m.minute), 0) as hour,
        avg(m.cnt) as average_concurrent_per_minute
    from
        (select  
            system, minute, count(*) as cnt
        from 
            #r_min 
        group by
            system, minute
        ) m
    group by
        m.system,
        dateadd(hh, datediff(hh, 0, m.minute), 0)
    
    
    drop table #Records
    drop table #r
    drop table #r_min
    

    the last select yields...

    system  hour    average_concurrent_per_minute
    1   2009-04-16 19:00:00.000 1
    2   2009-04-16 19:00:00.000 3
    
    0 讨论(0)
提交回复
热议问题