I have a table that looks something like this:
DataTable
+------------+------------+------------+
| Date | DailyData1 | DailyData2 |
+------------+------------+------------+
| 2012-01-23 | 146.30 | 212.45 |
| 2012-01-20 | 554.62 | 539.11 |
| 2012-01-19 | 710.69 | 536.35 |
+------------+------------+------------+
I'm trying to create a view (call it AggregateView
) that will, for each date and for each data column, show a few different aggregates. For example, select * from AggregateView where Date = '2012-01-23'
might give:
+------------+--------------+----------------+--------------+----------------+
| Date | Data1_MTDAvg | Data1_20DayAvg | Data2_MTDAvg | Data2_20DayAvg |
+------------+--------------+----------------+--------------+----------------+
| 2012-01-23 | 697.71 | 566.34 | 601.37 | 192.13 |
+------------+--------------+----------------+--------------+----------------+
where Data1_MTDAvg
shows avg(DailyData1)
for each date in January prior to Jan 23, and Data1_20DayAvg
shows the same but for the prior 20 dates in the table. I'm no SQL ninja, but I was thinking that the best way to do this would be via subqueries. The MTD average is easy:
select t1.Date, (select avg(t2.DailyData1)
from DataTable t2
where t2.Date <= t1.Date
and month(t2.Date) = month(t1.Date)
and year(t2.Date) = year(t1.Date)) Data1_MTDAvg
from DataTable t1;
But I'm getting hung up on the 20-day average due to the need to limit the number of results returned. Note that the dates in the table are irregular, so I can't use a date interval; I need the last twenty records in the table, rather than just all records over the last twenty days. The only solution I've found is to use a nested subquery to first limit the records selected, and then take the average.
Alone, the subquery works for individual hardcoded dates:
select avg(t2.DailyData1) Data1_20DayAvg
from (select DailyData1
from DataTable
where Date <= '2012-01-23'
order by Date desc
limit 0,20) t2;
But trying to embed this as part of the greater query blows up:
select t1.Date, (select avg(t2.DailyData1) Data1_20DayAvg
from (select DailyData1
from DataTable
where Date <= t1.Date
order by Date desc
limit 0,20) t2)
from DataTable t1;
ERROR 1054 (42S22): Unknown column 't1.Date' in 'where clause'
From searching around I get the impression that you can't use correlated subqueries as part of a from
clause, which I think is where the problem is here. The other issue is that I'm not sure if MySQL will accept a view definition containing a from
clause in a subquery. Is there a way to limit the data in my aggregate selection without resorting to subqueries, in order to work around these two issues?
No, you can't use correalted subqueries in the FROM
clause. But you can use them in the ON
conditions:
SELECT AVG(d.DailyData1) Data1_20DayAvg
--- other aggregate stuff on d (Datatable)
FROM
( SELECT '2012-01-23' AS DateChecked
) AS dd
JOIN
DataTable AS d
ON
d.Date <= dd.DateChecked
AND
d.Date >= COALESCE(
( SELECT DailyData1
FROM DataTable AS last20
WHERE Date <= dd.DateChecked
AND (other conditions for last20)
ORDER BY Date DESC
LIMIT 1 OFFSET 19
), '1001-01-01' )
WHERE (other conditions for d Datatable)
Similar, for many dates:
SELECT dd.DateChecked
, AVG(d.DailyData1) Data1_20DayAvg
--- other aggregate stuff on d (Datatable)
FROM
( SELECT DISTINCT Date AS DateChecked
FROM DataTable
) AS dd
JOIN
DataTable AS d
ON
d.Date <= dd.DateChecked
AND
d.Date >= COALESCE(
( SELECT DailyData1
FROM DataTable AS last20
WHERE Date <= dd.DateChecked
AND (other conditions for last20)
ORDER BY Date DESC
LIMIT 1 OFFSET 19
), '1001-01-01' )
WHERE (other conditions for d Datatable)
GROUP BY
dd.DateChecked
Both queries assume that Datatable.Date
has a UNIQUE
constraint.
来源:https://stackoverflow.com/questions/9068777/mysql-alternatives-to-nested-subqueries-when-limiting-aggregate-data-in-a-corr