Optimal query to fetch a cumulative sum in MySQL

后端 未结 2 1196
长发绾君心
长发绾君心 2021-01-04 14:18

What is \'correct\' query to fetch a cumulative sum in MySQL?

I\'ve a table where I keep information about files, one column list contains the size

相关标签:
2条回答
  • 2021-01-04 14:35

    You could use a variable - it's far quicker than any join:

    SELECT
        id,
        size,
        @total := @total + size AS cumulativeSize,
    FROM table, (SELECT @total:=0) AS t;
    

    Here's a quick test case on a Pentium III with 128MB RAM running Debian 5.0:

    Create the table:

    DROP TABLE IF EXISTS `table1`;
    
    CREATE TABLE `table1` (
        `id` int(11) NOT NULL auto_increment,
        `size` int(11) NOT NULL,
        PRIMARY KEY  (`id`)
    ) ENGINE=InnoDB;
    

    Fill with 20,000 random numbers:

    DELIMITER //
    DROP PROCEDURE IF EXISTS autofill//
    CREATE PROCEDURE autofill()
    BEGIN
        DECLARE i INT DEFAULT 0;
        WHILE i < 20000 DO
            INSERT INTO table1 (size) VALUES (FLOOR((RAND() * 1000)));
            SET i = i + 1;
        END WHILE;
    END;
    //
    DELIMITER ;
    
    CALL autofill();
    

    Check the row count:

    SELECT COUNT(*) FROM table1;
    
    +----------+
    | COUNT(*) |
    +----------+
    |    20000 |
    +----------+
    

    Run the cumulative total query:

    SELECT
        id,
        size,
        @total := @total + size AS cumulativeSize
    FROM table1, (SELECT @total:=0) AS t;
    
    +-------+------+----------------+
    |    id | size | cumulativeSize |
    +-------+------+----------------+
    |     1 |  226 |            226 |
    |     2 |  869 |           1095 |
    |     3 |  668 |           1763 |
    |     4 |  733 |           2496 |
    ...
    | 19997 |  966 |       10004741 |
    | 19998 |  522 |       10005263 |
    | 19999 |  713 |       10005976 |
    | 20000 |    0 |       10005976 |
    +-------+------+----------------+
    20000 rows in set (0.07 sec)
    

    UPDATE

    I'd missed the grouping by groupId in the original question, and that certainly made things a bit trickier. I then wrote a solution which used a temporary table, but I didn't like it—it was messy and overly complicated. I went away and did some more research, and have come up with something far simpler and faster.

    I can't claim all the credit for this—in fact, I can barely claim any at all, as it is just a modified version of Emulate row number from Common MySQL Queries.

    It's beautifully simple, elegant, and very quick:

    SELECT fileInfoId, groupId, name, size, cumulativeSize
    FROM (
        SELECT
            fileInfoId,
            groupId,
            name,
            size,
            @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize,
            @prev_groupId := groupId AS prev_groupId
        FROM fileInfo, (SELECT @prev_groupId:=0, @cs:=0) AS vars
        ORDER BY groupId
    ) AS tmp;
    

    You can remove the outer SELECT ... AS tmp if you don't mind the prev_groupID column being returned. I found that it ran marginally faster without it.

    Here's a simple test case:

    INSERT INTO `fileInfo` VALUES
    ( 1, 3, 'name0', '10'),
    ( 5, 3, 'name1', '10'),
    ( 7, 3, 'name2', '10'),
    ( 8, 1, 'name3', '10'),
    ( 9, 1, 'name4', '10'),
    (10, 2, 'name5', '10'),
    (12, 4, 'name6', '10'),
    (20, 4, 'name7', '10'),
    (21, 4, 'name8', '10'),
    (25, 5, 'name9', '10');
    
    SELECT fileInfoId, groupId, name, size, cumulativeSize
    FROM (
        SELECT
            fileInfoId,
            groupId,
            name,
            size,
            @cs := IF(@prev_groupId = groupId, @cs+size, size) AS cumulativeSize,
            @prev_groupId := groupId AS prev_groupId
        FROM fileInfo, (SELECT @prev_groupId := 0, @cs := 0) AS vars
        ORDER BY groupId
    ) AS tmp;
    
    +------------+---------+-------+------+----------------+
    | fileInfoId | groupId | name  | size | cumulativeSize |
    +------------+---------+-------+------+----------------+
    |          8 |       1 | name3 |   10 |             10 |
    |          9 |       1 | name4 |   10 |             20 |
    |         10 |       2 | name5 |   10 |             10 |
    |          1 |       3 | name0 |   10 |             10 |
    |          5 |       3 | name1 |   10 |             20 |
    |          7 |       3 | name2 |   10 |             30 |
    |         12 |       4 | name6 |   10 |             10 |
    |         20 |       4 | name7 |   10 |             20 |
    |         21 |       4 | name8 |   10 |             30 |
    |         25 |       5 | name9 |   10 |             10 |
    +------------+---------+-------+------+----------------+
    

    Here's a sample of the last few rows from a 20,000 row table:

    |      19481 |     248 | 8CSLJX22RCO | 1037469 |       51270389 |
    |      19486 |     248 | 1IYGJ1UVCQE |  937150 |       52207539 |
    |      19817 |     248 | 3FBU3EUSE1G |  616614 |       52824153 |
    |      19871 |     248 | 4N19QB7PYT  |  153031 |       52977184 |
    |        132 |     249 | 3NP9UGMTRTD |  828073 |         828073 |
    |        275 |     249 | 86RJM39K72K |  860323 |        1688396 |
    |        802 |     249 | 16Z9XADLBFI |  623030 |        2311426 |
    ...
    |      19661 |     249 | ADZXKQUI0O3 |  837213 |       39856277 |
    |      19870 |     249 | 9AVRTI3QK6I |  331342 |       40187619 |
    |      19972 |     249 | 1MTAEE3LLEM | 1027714 |       41215333 |
    +------------+---------+-------------+---------+----------------+
    20000 rows in set (0.31 sec)
    
    0 讨论(0)
  • 2021-01-04 14:47

    I think that MySQL is only using one of the indexes on the table. In this case, it's choosing the index on foreignId.

    Add a covering compound index that includes both primaryId and foreignId.

    0 讨论(0)
提交回复
热议问题