问题
Can anyone explain this query when subquery is referencing parent. How does SQL think about this
Second highest salary of employee:
select max(e1.sal),e1.deptno
from s_emp e1
where sal < (select max(sal) from s_emp e2 where e2.deptno = e1.deptno)
group by e1.deptno;
I tested it and it works.
回答1:
First remove group by and aggegation and consider this query:
select e1.sal, e1.deptno
from s_emp e1
where e1.sal < (select max(sal) from s_emp e2 where e2.deptno = e1.deptno)
It returns all the rows of the table except the ones with the maximum sal
in their deptno
.
Why?
Because each row's sal
is compared to the deptno
's max salary and must be less.
The subquery in the WHERE
clause is executed once for every row of the table:
select max(e2.sal) from s_emp e2 where e2.deptno = e1.deptno
and for every row it returns the maximum sal
for the current row's deptno
.
So the result is all sal
s that are less than this max sal
of the current row's deptno
.
Now if you add group by deptno
and aggregation you get for each deptno
the max sal
of the returned rows which is the 2nd highest sal
for each deptno
since all the top ones are already excluded.
回答2:
This is called a correlated subquery, because the result of the subquery is potentially different for every row of the outer query.
When MySQL runs a query, you can think of it like a foreach loop over a collection. For example in PHP syntax:
foreach (s_emp as e1) {
...
}
It must run the subquery for each row of the outer query, before it can evaluate the <
comparison. This will become quite expensive as the number of rows increases. If the table has N rows, it will run the subquery N times, even if there are only a few distinct values for deptno! MySQL is not smart enough to remember the result after having run the subquery for the same deptno value.
Instead, you can get the result this way, which will calculate the max(sal) for all deptnos, and keep those results in a temporary table.
select max(e1.sal), e1.deptno
from s_emp e1
join (select deptno, max(sal) as max_sal from s_emp group by deptno) as e2
on e1.deptno = e2.deptno
where e1.sal < e2.max_sal
group by e1.deptno
The purpose of this query appears to be to return the second highest salary per department, right?
Here's another solution using window functions in MySQL 8.0:
select deptno, sal
from (
select deptno, sal, dense_rank() over (partition by deptno order by sal desc) as dr
from s_emp
) as e1
where dr = 2
来源:https://stackoverflow.com/questions/59865236/referencing-parent-query-inside-child-query