Count the number of rows in 30 day bins

≯℡__Kan透↙ 提交于 2019-12-12 08:06:02

问题


Each row in my table has a date time stamp, and I wish to query the database from now, to count how many rows are in the last 30 days, the 30 days before that and so on. Until there is a 30 day bin going back to the start of the table.

I have successfully carried out this query by using Python and making several queries. But I'm almost certain that it can be done in one single MySQL query.


回答1:


No stored procedures, temporary tables, only one query, and an efficient execution plan given an index on the date column:

select

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30 + 30 - 1
  ) as "period starting",

  subdate(
    '2012-12-31',
    floor(dateDiff('2012-12-31', dateStampColumn) / 30) * 30
  ) as "period ending",

  count(*)

from
  YOURTABLE
group by floor(dateDiff('2012-12-31', dateStampColumn) / 30);

It should be pretty obvious what is happening here, except for this incantation:

floor(dateDiff('2012-12-31', dateStampColumn) / 30)

That expression appears several times, and it evaluates to the number of 30-day periods ago dateStampColumn is. dateDiff returns the difference in days, divide it by 30 to get it in 30-day periods, and feed it all to floor() to round it to an integer. Once we have this number, we can GROUP BY it, and further we do a bit of math to translate this number back into the starting and ending dates of the period.

Replace '2012-12-31' with now() if you prefer. Here's some sample data:

CREATE TABLE YOURTABLE
    (`Id` int, `dateStampColumn` datetime);

INSERT INTO YOURTABLE
    (`Id`, `dateStampColumn`)
VALUES
    (1, '2012-10-15 02:00:00'),
    (1, '2012-10-17 02:00:00'),
    (1, '2012-10-30 02:00:00'),
    (1, '2012-10-31 02:00:00'),
    (1, '2012-11-01 02:00:00'),
    (1, '2012-11-02 02:00:00'),
    (1, '2012-11-18 02:00:00'),
    (1, '2012-11-19 02:00:00'),
    (1, '2012-11-21 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-25 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-26 02:00:00'),
    (1, '2012-11-24 02:00:00'),
    (1, '2012-11-23 02:00:00'),
    (1, '2012-11-28 02:00:00'),
    (1, '2012-11-29 02:00:00'),
    (1, '2012-11-30 02:00:00'),
    (1, '2012-12-01 02:00:00'),
    (1, '2012-12-02 02:00:00'),
    (1, '2012-12-15 02:00:00'),
    (1, '2012-12-17 02:00:00'),
    (1, '2012-12-18 02:00:00'),
    (1, '2012-12-19 02:00:00'),
    (1, '2012-12-21 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-25 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-26 02:00:00'),
    (1, '2012-12-24 02:00:00'),
    (1, '2012-12-23 02:00:00'),
    (1, '2012-12-31 02:00:00'),
    (1, '2012-12-30 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-28 02:00:00'),
    (1, '2012-12-30 02:00:00');

And the result:

period starting     period ending   count(*)
2012-12-02          2012-12-31      17
2012-11-02          2012-12-01      14
2012-10-03          2012-11-01      5

period endpoints are inclusive.

Play with this in SQL Fiddle.

There's a bit of potential goofiness in that any 30 day period with zero matching rows will not be included in the result. If you could join this against a table of periods, that could be eliminated. However, MySQL doesn't have anything like PostgreSQL's generate_series(), so you'd have to deal with it in your application or try this clever hack.




回答2:


If you just need to count intervals where there's at least one row, you could use this:

select
  datediff(curdate(), `date`) div 30 as block,
  count(*) as rows_per_block
from
  your_table
group by
  block

And this also shows the start date and the end date:

select
  datediff(curdate(), d) div 30 as block,
  date_sub(curdate(),
           INTERVAL (datediff(curdate(), `date`) div 30)*30 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL (1+datediff(curdate(), `date`) div 30)*30-1 DAY) as end_block,
  count(*)
from your_table
group by block

but if you also need to show all intervals, you could use a solution like this:

select
  num,
  date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY) as start_block,
  date_sub(curdate(),
           INTERVAL num*30 DAY) as end_block,
  count(`date`)
from
  numbers left join your_table
  on `date` between date_sub(curdate(),
           INTERVAL (num+1)*30-1 DAY)  and
  date_sub(curdate(),
           INTERVAL num*30 DAY)
where num<=(datediff(curdate(), (select min(`date`) from your_table) ) div 30)
group by num

but this requires that you have a numbers table already prepared, or see fiddle here for a solution without numbers table.




回答3:


Try this:

SELECT 
  DATE_FORMAT(t1.`Date`, '%Y-%m-%d'),
  COUNT(t2.Id)
FROM 
(
  SELECT SUBDATE(CURDATE(), ID) `Date`
  FROM
  (
    SELECT  t2.digit * 10 + t1.digit + 1 AS id
    FROM         TEMP AS t1
    CROSS JOIN TEMP AS t2
  ) t 
  WHERE Id <= 30 
) t1
LEFT JOIN YOURTABLE t2 ON DATE(t1.`Date`) = DATE(t2.dateStampColumn)
GROUP BY t1.`Date`;

SQL Fiddle Demo

But, you will need to create a temp table Temp like so:

CREATE TABLE TEMP 
(Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);



回答4:


Could you please try the following:

SELECT Count(*)
FROM
  yourtable
where
  dateColumn between Now() and Now() - Interval 30 Day

It needs some looping, for a better answer to isolote all 30 days intervals going back. As you also need a 30 day interval between min(Date) in the table and the last loop date :) Or to the least another table that carries the dates of each 30 day interval, and then join.

Here is getting count just by each calendar month. Not exactly what you need.

SELECT
  extract(month from datecolumn),
  count(*)
FROM
  yourtable
GROUP BY
  extract(month from datecolumn);

Given a thought to my latter comment and Stefan's comment, here is a long code yet with proper resutls. Based on my own sample data and compatible with MYSQL with interval. If you need to use with SQL Server please use DateADD or quivalent function.

  • SQLFIDDLE

Sample data:

ID_MAIN  FIELD1  FILTER
----------------------------------------
1        red     August, 05 2012 00:00:00+0000
2        blue    September, 15 2012 00:00:00+0000
3        pink    September, 20 2012 00:00:00+0000
4        blue    September, 27 2012 00:00:00+0000
5        blue    October, 02 2012 00:00:00+0000
6        blue    October, 16 2012 00:00:00+0000
7        blue    October, 22 2012 00:00:00+0000
8        pink    November, 12 2012 00:00:00+0000
9        pink    November, 28 2012 00:00:00+0000
10       pink    December, 01 2012 00:00:00+0000
11       pink    December, 08 2012 00:00:00+0000
12       pink    December, 22 2012 00:00:00+0000

Query:

set @i:= 0;
SELECT MIN(filter) INTO @mindt
FROM MAIN
;
select
  count(a.id_main),
  y.dateInterval,
  (y.dateInterval - interval 29 day) as lowerBound
from
  main a join (
    SELECT date_format(Now(),'%Y-%m-%d') as dateInterval
    from dual
    union all
    select x.dateInterval
    from (
      SELECT
        date_format(
          DATE(DATE_ADD(Now(),
                        INTERVAL @i:=@i-29 DAY)),'%Y-%m-%d') AS dateInterval
      FROM Main, (SELECT @i:=0) r
      HAVING datediff(dateInterval,@mindt) >= 30
      order by dateInterval desc) as x) as y
  on a.filter <= y.dateInterval 
     and a.filter > (y.dateInterval - interval 29 day)
group by y.dateInterval
order by y.dateInterval desc
;

Results:

COUNT(A.ID_MAIN)    DATEINTERVAL    LOWERBOUND
----------------------------------------------
2                   2012-12-30  2012-12-01
3                   2012-12-01  2012-11-02
2                   2012-11-02  2012-10-04
4                   2012-10-04  2012-09-05



回答5:


Create a stored procedure to count number of rows by 30 days.

First run this procedure and then call the same procedure when you want to genrate data.

DELIMITER $$

DROP PROCEDURE IF EXISTS `sp_CountDataByDays`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `sp_CountDataByDays`()
BEGIN 
    CREATE TEMPORARY TABLE daterange (
            id INT(10) UNSIGNED NOT NULL AUTO_INCREMENT, 
            fromDate DATE, 
            toDate DATE, 
            PRIMARY KEY (`id`)
    ); 

    SELECT DATEDIFF(CURRENT_DATE(), dteCol) INTO @noOfDays 
    FROM yourTable ORDER BY dteCol LIMIT 1;

    SET @counter = -1;
    WHILE (@noOfDays > @counter) DO 
        INSERT daterange (toDate, fromDate) 
        VALUES (DATE_SUB(CURRENT_DATE(), INTERVAL @counter DAY), DATE_SUB(CURRENT_DATE(), INTERVAL @counter:=@counter + 30 DAY));
    END WHILE;

    SELECT d.id, d.fromdate, d.todate, COUNT(d.id) rowcnt 
    FROM daterange d  
    INNER JOIN yourTable a ON a.dteCol BETWEEN d.fromdate AND d.todate 
    GROUP BY d.id;

    DROP TABLE daterange;
END$$

DELIMITER ;

Then CALL the procedure:

CALL sp_CountDataByDays();

You get the output as below:

ID  From Date   To Date     Row Count
1   2012-12-06  2013-01-05  17668
2   2012-11-06  2012-12-06  2845
3   2012-10-07  2012-11-06  2276
4   2012-09-07  2012-10-07  4561
5   2012-08-08  2012-09-07  5415
6   2012-07-09  2012-08-08  8954
7   2012-06-09  2012-07-09  4387
8   2012-05-10  2012-06-09  7911
9   2012-04-10  2012-05-10  7935
10  2012-03-11  2012-04-10  2566


来源:https://stackoverflow.com/questions/14090016/count-the-number-of-rows-in-30-day-bins

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!