most active time of day based on start and end time

后端 未结 7 1618
悲哀的现实
悲哀的现实 2021-02-08 06:53

I\'m logging statistics of the gamers in my community. For both their online and in-game states I\'m registering when they \"begin\" and when they \"end\". In order to show the

7条回答
  •  夕颜
    夕颜 (楼主)
    2021-02-08 07:37

    You need a sequence to get values for hours where there was no activity (e.g. hours where nobody starting or finishing, but there were people on-line who had started but had not finished in that time). Unfortunately there is no nice way to create a sequence in MySQL so you will have to create the sequence manually;

    CREATE TABLE `hour_sequence` (
      `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `hour` datetime NOT NULL,
      KEY (`hour`),
      PRIMARY KEY (`ID`)
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
    
    # this is not great
    INSERT INTO `hour_sequence` (`hour`) VALUES
    ("2013-12-01 00:00:00"),
    ("2013-12-01 01:00:00"),
    ("2013-12-01 02:00:00"),
    ("2013-12-01 03:00:00"),
    ("2013-12-01 04:00:00"),
    ("2013-12-01 05:00:00"),
    ("2013-12-01 06:00:00"),
    ("2013-12-01 07:00:00"),
    ("2013-12-01 08:00:00"),
    ("2013-12-01 09:00:00"),
    ("2013-12-01 10:00:00"),
    ("2013-12-01 11:00:00"),
    ("2013-12-01 12:00:00");
    

    Now create some test data

    CREATE TABLE `log_table` (
      `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
      `userID` bigint(20) unsigned NOT NULL,
      `started` datetime NOT NULL,
      `finished` datetime NOT NULL,
      KEY (`started`),
      KEY (`finished`),
      PRIMARY KEY (`ID`)
    ) ENGINE=InnoDB DEFAULT CHARSET latin1;
    
    INSERT INTO `log_table` (`userID`,`started`,`finished`) VALUES
    (1, "2013-12-01 00:00:12", "2013-12-01 02:25:00"),
    (2, "2013-12-01 07:25:00", "2013-12-01 08:23:00"),
    (1, "2013-12-01 04:25:00", "2013-12-01 07:23:00");
    

    Now the query - for every hour we keep a tally (accumulation/running total/integral etc) of how many people started a session hour-on-hour

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS starts
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.started
      GROUP BY
       HS.hour
    

    And also how many people went off-line likewise

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      GROUP BY
       HS.hour
    

    By subtracting the accumulation of people that had gone off-line at a point in time from the accumulation of people that have come on-line at that point in time we get the number of people who were on-line at that point in time (presuming there were zero people on-line when the data starts, of course).

    SELECT
     starts.period_starting,
     starts.starts as users_started,
     finishes.finishes as users_finished,
     starts.starts - finishes.finishes as users_online
    
    FROM
     (
      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS starts
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.started
      GROUP BY
       HS.hour
     ) starts
    
     LEFT JOIN (
      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      GROUP BY
       HS.hour
     ) finishes ON starts.period_starting = finishes.period_starting;
    

    Now a few caveats. First of all you will need a process to keep your sequence table populated with the hourly timestamps as time progresses. Additionally the accumulators do not scale well with large amounts of log data due to the tenuous join - it would be wise to constrain access to the log table by timestamp in both the starts and finishes subquery, and the sequence table while you are at it.

      SELECT
       HS.hour as period_starting,
       COUNT(LT.userID) AS finishes
      FROM `hour_sequence` HS
       LEFT JOIN `log_table` LT ON HS.hour > LT.finished
      WHERE
       LT.finished BETWEEN ? AND ? AND HS.hour BETWEEN ? AND ?
      GROUP BY
       HS.hour
    

    If you start constraining your log_table data to specific time ranges bear in mind you will have an offset issue if, at the point you start looking at the log data, there were already people on-line. If there were 1000 people on-line at the point where you start looking at your log data then you threw them all off the server from the query it would look like we went from 0 people on-line to -1000 people on-line!

提交回复
热议问题