SQL: Retrieving total sum as subselect very slow

问题

I am trying to fetch some averages and some sums over several rows, grouping by each hour of the day. Plus I want to fetch an additional column, where I don't get the sums for each hour (which is fetched when grouping), but where I want to fetch the total sum over all rows until that specific date. The SQL-statement is posted below.

My problem is now, that executing the query on a MySQL database over ~25k rows takes about 8 seconds (CPU i5/8GB RAM). I identified that the subselect (... AS 'rain_sum') makes it very slow. My question is now: Do I think in a too complex way? Is there an easier way to get the same results I get from the query below?

SELECT
    `timestamp_local` AS `date`,
    AVG(`one`) AS `one_avg`,
    AVG(`two`) AS `two_avg`,
    SUM(`three`) AS `three_sum`,
    (SELECT SUM(`b`.`three`)
        FROM `table` AS `b`
        WHERE `b`.`timestamp_local` <= SUBDATE(`a`.`timestamp_local`, INTERVAL -1 SECOND)
        LIMIT 0,1) AS `three_sum`
FROM  `table` AS  `a`
GROUP BY
    HOUR( `a`.`timestamp_local` ),
    DAY( `a`.`timestamp_local` ),
    MONTH( `a`.`timestamp_local` ),
    WEEK( `a`.`timestamp_local` ),
    YEAR( `a`.`timestamp_local` )
ORDER BY `a`.`timestamp_local` DESC
LIMIT 0, 24;

回答1:

Rather than grouping on all those fields, a simpler (and faster) solution (from here) may be:

GROUP BY UNIX_TIMESTAMP(timestamp_local)/3600

I can't imagine that your query returns the results you want (if I understand your requirements correctly). I understand your requirements as, when there are no rows for a given hour, you want to calculate the sum of all rows with hour < that hour. MySQL won't select empty groupings (for the sub-query part).

There's no easy efficient way to do this in MySQL that I know of, I would suggest creating a temporary table with all possible grouping values in the range that your looking at (probably with a loop). You can probably set this table up beforehand for a few years, and possibly add rows as required. Then you can just left join this table and your table.

If you were using MSSQL, you could've used a recursive CTE, though this would probably have been very slow. Look at this or google "mysql cte" for MySQL alternatives. The way to do this with recursion is to (left) join on the same table repeatedly for HOUR = HOUR+1 until you get a non-NULL value, then stop. For each of these you will calculate the sum backwards.

来源：https://stackoverflow.com/questions/14014674/sql-retrieving-total-sum-as-subselect-very-slow

标签

mysql

performance

sum

subquery