most active time of day based on start and end time

后端 未结 7 1624
悲哀的现实
悲哀的现实 2021-02-08 06:53

I\'m logging statistics of the gamers in my community. For both their online and in-game states I\'m registering when they \"begin\" and when they \"end\". In order to show the

相关标签:
7条回答
  • 2021-02-08 07:19

    The easiest solution is to run a cron at the top of each hour of who has a start time but no end time (null end time? if you reset it when they login) and log that count. This will give you a count of currently logged in at each hour without needing to do funky schema changes or wild queries.

    Now when you check the next hour and they had logged out they would fall out of your results. This query would work if you reset end time when they login.

    SELECT CONCAT(CURDATE(), ' ', HOUR(NOW()), ' ', COUNT(*)) FROM activity WHERE DATE(start) = CURDATE() AND end IS NULL;

    Then you can log this at your hearts content to a file or to another table (Of course you might need to adjust the select per your log table). For example you can have a table that gets one entry per day and only gets updated once.

    Assume a log table like:

    current_date | peak_hour | peak_count

    SELECT IF(peak_count< $peak_count, true, false) FROM log where DATE(current_date) = NOW();
    

    where $peak_count is a variable coming from your cron. If you find that you have a new bigger peak count you do an update, if the record does not exist for the day do an insert into log. Otherwise, no you have not beat a peak_hour from earlier in the day, don't do an update. This means each day will give you only 1 row in your table. Then you don't need to do any aggregation, it is all right there for you to see the date and hour over the course of a week or month or whatever.

    0 讨论(0)
  • 2021-02-08 07:20

    This query is for oracle, but you can get idea from it:

    SELECT
        H, M, 
        COUNT(BEGIN)
    FROM
        -- temporary table that should return numbers from 0 to 1439
        -- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
        -- in oracle you can use CONNECT BY clause which is designated to do recursive queries
        (SELECT LEVEL - 1 DAYMIN, FLOOR((LEVEL - 1) / 60) H, MOD((LEVEL - 1), 60) M FROM dual CONNECT BY LEVEL <= 1440) T LEFT JOIN
    
        -- join stats to each row from T by converting discarding date and converting time to minute of a day
        STATS S ON 60 * TO_NUMBER(TO_CHAR(S.BEGIN, 'HH24')) + TO_NUMBER(TO_CHAR(S.BEGIN, 'MI')) <= T.DAYMIN AND
                   60 * TO_NUMBER(TO_CHAR(S.END, 'HH24'))   + TO_NUMBER(TO_CHAR(S.END, 'MI'))   >  T.DAYMIN
    
    GROUP BY H, M
    HAVING COUNT(BEGIN) > 0
    ORDER BY H, M
    
    GROUP BY H, M
    HAVING COUNT(BEGIN) > 0
    ORDER BY H, M
    

    Fiddle: http://sqlfiddle.com/#!4/e5e31/9

    The idea is to have some temp table or view with one row for time point, and left join to it. In my example there is one row for every minute in day. In mysql you can use variables to create such view on-the-fly.

    MySQL version:

    SELECT
        FLOOR(T.DAYMIN / 60), -- hour
        MOD(T.DAYMIN, 60), -- minute
        -- T.DAYMIN, -- minute of the day
        COUNT(S.BEGIN) -- count not null stats
    FROM
        -- temporary table that should return numbers from 0 to 1439
        -- each number represents minute of the day, for example 0 represents 0:00, 100 represents 1:40, etc.
        -- in mysql you must have some table which has at least 1440 rows; 
        -- I use (INFORMATION_SCHEMA.COLLATIONSxINFORMATION_SCHEMA.COLLATIONS) for that purpose - it should be
        -- in every database
        (
            SELECT 
                @counter := @counter + 1 AS DAYMIN
            FROM
                INFORMATION_SCHEMA.COLLATIONS A CROSS JOIN
                INFORMATION_SCHEMA.COLLATIONS B CROSS JOIN
                (SELECT @counter := -1) C
            LIMIT 1440
        ) T LEFT JOIN
    
        -- join stats to each row from T by converting discarding date and converting time to minute of a day
        STATS S ON (
            (60 * DATE_FORMAT(S.BEGIN, '%H')) + (1 * DATE_FORMAT(S.BEGIN, '%i')) <= T.DAYMIN AND
            (60 * DATE_FORMAT(S.END, '%H'))   + (1 * DATE_FORMAT(S.END, '%i'))   >  T.DAYMIN
        )
    
    GROUP BY T.DAYMIN
    HAVING COUNT(S.BEGIN) > 0 -- filter empty counters
    ORDER BY T.DAYMIN
    

    Fiddle: http://sqlfiddle.com/#!2/de01c/1

    0 讨论(0)
  • 2021-02-08 07:29

    If I understood your requirements correctly, if this graph represents user activity:

           Day 
           12/1 12/2 12/3 12/4 ...
    Hour 0  xx    x    x   xx
         1   x   xx        xx
         2 xxx    x    x   xx
         3   x              x
         4        x         x
         5   x              x
         6                  x
       ...
    

    You want to know that 02:00 is the time of the day with the highest average activity (a row with 7 x), and 12/4 was most active day (a column with 10 x). Note that this doesn't imply that 02:00 of 12/4 was the most active hour ever, as you can see in the example. If this is not what you want please clarify with concrete examples of input and desired result.

    We make a couple assumptions:

    • An activity record can start on one date and finish on the next one. For instance: online 2013-12-02 23:35, offline 2013-12-03 00:13.
    • No activity record has a duration longer than 23 hours, or the number of such records is negligible.

    And we need to define what does 'activity' mean. I picked the criteria that were easier to compute in each case. Both can be made more accurate if needed, at the cost of having more complex queries.

    • The most active time of day will be the hour with which more activity records overlap. Note that if a user starts and stops more than once during the hour it will be counted more than once.
    • The most active day will be the one for which there were more unique users that were active at any time of the day.

    For the most active time of day we'll use a small auxiliary table holding the 24 possible hours. It can also be generated and joined on the fly with the techniques described in other answers.

    CREATE TABLE hour ( hour tinyint not null, primary key(hour) );
    INSERT hour (hour)
    VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)
         , (11), (12), (13), (14), (15), (16), (17), (18), (19), (20)
         , (21), (22), (23);
    

    Then the following queries give the required results:

    SELECT hour, count(*) AS activity
      FROM steamonlineactivity, hour
     WHERE ( hour BETWEEN hour(online) AND hour(offline)
          OR hour(online) BETWEEN hour(offline) AND hour
          OR hour(offline) BETWEEN hour AND hour(online) )
     GROUP BY hour
     ORDER BY activity DESC;
    
    SELECT date, count(DISTINCT userID) AS activity
      FROM ( 
           SELECT userID, date(online) AS date
             FROM steamonlineactivity
            UNION
           SELECT userID, date(offline) AS date
             FROM steamonlineactivity
       ) AS x
     GROUP BY date
     ORDER BY activity DESC;
    
    0 讨论(0)
  • 2021-02-08 07:37

    You need a sequence to get values for hours where there was no activity (e.g. hours where nobody starting or finishing, but there were people on-line who had started but had not finished in that time). Unfortunately there is no nice way to create a sequence in MySQL so you will have to create the sequence manually;

    CREATE TABLE `hour_sequence` (
      `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `hour` datetime NOT NULL,
      KEY (`hour`),
      PRIMARY KEY (`ID`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    # this is not great
    INSERT INTO `hour_sequence` (`hour`) VALUES
    ("2013-12-01 00:00:00"),
    ("2013-12-01 01:00:00"),
    ("2013-12-01 02:00:00"),
    ("2013-12-01 03:00:00"),
    ("2013-12-01 04:00:00"),
    ("2013-12-01 05:00:00"),
    ("2013-12-01 06:00:00"),
    ("2013-12-01 07:00:00"),
    ("2013-12-01 08:00:00"),
    ("2013-12-01 09:00:00"),
    ("2013-12-01 10:00:00"),
    ("2013-12-01 11:00:00"),
    ("2013-12-01 12:00:00");
    

    Now create some test data

    CREATE TABLE `log_table` (
      `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `userID` bigint(20) unsigned NOT NULL,
      `started` datetime NOT NULL,
      `finished` datetime NOT NULL,
      KEY (`started`),
      KEY (`finished`),
      PRIMARY KEY (`ID`)
    ) ENGINE=InnoDB DEFAULT CHARSET latin1;
    
    INSERT INTO `log_table` (`userID`,`started`,`finished`) VALUES
    (1, "2013-12-01 00:00:12", "2013-12-01 02:25:00"),
    (2, "2013-12-01 07:25:00", "2013-12-01 08:23:00"),
    (1, "2013-12-01 04:25:00", "2013-12-01 07:23:00");
    

    Now the query - for every hour we keep a tally (accumulation/running total/integral etc) of how many people started a session hour-on-hour

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS starts
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.started
      GROUP BY
       HS.hour
    

    And also how many people went off-line likewise

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      GROUP BY
       HS.hour
    

    By subtracting the accumulation of people that had gone off-line at a point in time from the accumulation of people that have come on-line at that point in time we get the number of people who were on-line at that point in time (presuming there were zero people on-line when the data starts, of course).

    SELECT
     starts.period_starting,
     starts.starts as users_started,
     finishes.finishes as users_finished,
     starts.starts - finishes.finishes as users_online
    
    FROM
     (
      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS starts
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.started
      GROUP BY
       HS.hour
     ) starts
    
     LEFT JOIN (
      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      GROUP BY
       HS.hour
     ) finishes ON starts.period_starting = finishes.period_starting;
    

    Now a few caveats. First of all you will need a process to keep your sequence table populated with the hourly timestamps as time progresses. Additionally the accumulators do not scale well with large amounts of log data due to the tenuous join - it would be wise to constrain access to the log table by timestamp in both the starts and finishes subquery, and the sequence table while you are at it.

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      WHERE
       LT.finished BETWEEN ? AND ? AND HS.hour BETWEEN ? AND ?
      GROUP BY
       HS.hour
    

    If you start constraining your log_table data to specific time ranges bear in mind you will have an offset issue if, at the point you start looking at the log data, there were already people on-line. If there were 1000 people on-line at the point where you start looking at your log data then you threw them all off the server from the query it would look like we went from 0 people on-line to -1000 people on-line!

    0 讨论(0)
  • 2021-02-08 07:38

    I've been overthinking this question myself and based on everyone's answers I think it's obvious to conclude with the following;

    In general it's probably easy to implement some kind of separate table that has the hours of the day and do inner selects from that separate table. Other examples without a separate table have many sub selects, even with four tiers, which makes me believe they will probably not scale. Cron solutions have come to my mind as well, but the question was asked - out of curiosity - to focus on SQL queries and not other solutions.

    In my own case and completely outside the scope of my own question, I believe the best solution is to create a separate table with two fields (hour [Y-m-d H], onlinecount, playingcount) that counts the number of people online at a certain hour and the people playing at a certain hour. When a player stops playing or goes offline we update the count (+1) based on the start and end times. Thus I can easily deduce tables and graphs from this separate table.

    Please, let me know whether you come to the same conclusion. My thanks to @lolo, @rsanchez and @abasterfield. I wish I could split the bounty :)

    0 讨论(0)
  • 2021-02-08 07:38

    sqlFiddle, this query will give you the period that has the most userCount, the period could be between anytime, it just gives you the start time and end time that has the most userCount

    SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
    (
       SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
           (SELECT starttime,(SELECT MIN(endtime) FROM
                             (SELECT DISTINCT started as endtime FROM gameactivity WHERE started BETWEEN  '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
                              UNION
                              SELECT DISTINCT ended as endtime  FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
                             )T1
                          WHERE T1.endtime > T2.starttime
                         )as endtime
            FROM
            (SELECT DISTINCT started as starttime FROM gameactivity WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
             UNION
             SELECT DISTINCT ended as starttime  FROM gameactivity WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
            )T2
        )T3,
        GameActivity GA
        WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
        AND   T3.EndTime BETWEEN GA.Started AND GA.Ended
    )FinalTable
    GROUP BY StartTime,EndTime
    ORDER BY UserCount DESC
    LIMIT 1
    

    just change the date of '1970-01-01' occurences to the date you're trying to get data from.

    What the query does it selects all the times in the inner queries and then create intervals out of them, then join with GameActivity and count occurrences of users within those intervals and return the interval with the most userCount(most activity).

    here's an sqlFiddle with one less tier

    SELECT StartTime,EndTime,COUNT(*)as UserCount FROM
    (
    SELECT T3.StartTime,T3.EndTime,GA.Started,GA.Ended FROM
    (SELECT DISTINCT started as starttime,(SELECT MIN(ended)as endtime FROM
                       gameactivity T1 WHERE ended BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
                       AND T1.ended > T2.started
                      )as endtime
    FROM
     gameactivity T2
     WHERE started BETWEEN '1970-01-01 00:00:00' AND '1970-01-01 23:59:59'
     )T3,
    GameActivity GA
    WHERE T3.StartTime BETWEEN GA.Started AND GA.Ended
    AND   T3.EndTime BETWEEN GA.Started AND GA.Ended
    )FinalTable
    GROUP BY StartTime,EndTime
    ORDER BY UserCount DESC
    LIMIT 1
    

    or according to your query in your question above, you don't seem to care about dates, but only hour statistics across all dates then the below query might do it (your query just looks at the HOUR of started and ended and ignore users that play longer than 1 hour. the below query might do it for you sqlFiddle

    SELECT COUNT(*) as UserCount,
           HOURSTABLE.StartHour,
           HOURSTABLE.EndHour
    FROM
        (SELECT @hour as StartHour,
               @hour:=@hour + 1 as EndHour
         FROM
            gameActivity as OrAnyTableWith24RowsOrMore,
            (SELECT @hour:=0)as InitialValue
         LIMIT 24) as HOURSTABLE,
         gameActivity GA
    WHERE HOUR(GA.started) >= HOURSTABLE.StartHour
      AND HOUR(GA.ended) <= HOURSTABLE.EndHour
    GROUP BY HOURSTABLE.StartHour,HOURSTABLE.EndHour
    ORDER BY UserCount DESC
    LIMIT 1
    

    just delete the LIMIT 1 if you want to see userCount for other hours as well.

    0 讨论(0)
提交回复
热议问题