SUBQUERY total performance vs case sum performance

情到浓时终转凉″ 提交于 2019-12-06 00:48:43

Expanding on Martin's answer - it depends on what indexes you have and how populated the column is (nullable or not). Consider this example.

create table tbl (id int identity primary key, a int ,b int,c int, d int)
insert tbl values(1,2,3,null)
insert tbl values(2,null,3,1)
insert tbl values(1,null,1,4)
insert tbl values(1,null,3,5)
insert tbl values(1,null,3,6)
insert tbl select a,b,c,d from tbl --10
insert tbl select a,b,c,d from tbl --20
insert tbl select a,b,c,d from tbl --40
insert tbl select a,b,c,d from tbl --80
insert tbl select a,b,c,d from tbl --160
insert tbl select a,b,c,d from tbl --320
insert tbl select a,b,c,d from tbl --640
insert tbl select a,b,c,d from tbl --1280
insert tbl select a,b,c,d from tbl --2560
insert tbl select a,b,c,d from tbl --5120
insert tbl select a,b,c,d from tbl --10240

Column b is created nullable and is only 20% non-null. Now, run your queries against the table (with no indexes). Before you run it, make sure to press Ctrl-M (show actual execution plan). Run both queries in the same batch, i.e. highlight the text of both queries and execute.

SELECT (SELECT SUM(a) from tbl where a=1) AS a ,          
       (SELECT SUM(b) from tbl where b=2) AS b ,         
       (SELECT SUM(c) from tbl where c=3) AS c

select sum((case  when a=1 then a  else null end)),
       sum((case  when b=2 then b  else null end)),
       sum((case  when c=3 then c  else null end))
from tbl

I won't bore you with images here but look at the plan which will show a cost of about 75% against the top query and 25% against the bottom. That's expected, 75%:25% = 3:1 which is due to the first query passing through the table 3 times exactly. Now create these three indexes:

create index ix_tbl_a on tbl(a)
create index ix_tbl_b on tbl(b)
create index ix_tbl_c on tbl(c)

Then, rerun the query batch (both together). This time, you'll see a cost of about 51% to 49%. Quite close. The reason is because the (b) column being sparsely populated is very easy to SUM from the index pages alone. Even the other 2 columns are helped by retrieving more rows per index page than the clustered index on the data pages (which will contain all columns).

When you expand this to 27 columns, the first form could run faster if each column is sparsely populated and if you have an index on each of the 27 columns. A big ask, and even then, it will probably only be very marginally faster.

The second option makes a single pass of the table; the first one makes multiple passes. Performance-wise, the second option should be superior in most cases.

It depends what indexes you have.

If a, b, and c all are indexed then the original version could be significantly faster. Particularly if a large proportion of the table doesn't meet any of the criteria.

If you have no useful indexes at all then the choice is three scans vs one scan so the CASE version should be faster.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!