问题
I have to do sum of some columns basis on where clause for better understanding i am implementing a temporary table here
declare @tbl table(a int ,b int,c int)
insert into @tbl values(1,2,3)
insert into @tbl values(2,2,3)
insert into @tbl values(1,3,1)
insert into @tbl values(1,2,3)
insert into @tbl values(1,2,3)
and for finding sum of a,b,c ob basis of value of a,b,c ; i am using following query
SELECT (
SELECT SUM(a) from @tbl where a=1
)AS a ,
(SELECT SUM(b) from @tbl where b=2
)AS b ,
(SELECT SUM(c) from @tbl where c=3
)AS c
I ask one of my friend to make a single line query for this work and he suggest me following lines
select sum((case when a=1 then a else null end)),
sum((case when b=2 then b else null end)),
sum((case when c=3 then c else null end))
from @tbl
Now i am thinking about performance which will work faster if i have 27 columns and millions of records ?
or any other method to achive this which will improve performance much better than these two
回答1:
Expanding on Martin's answer - it depends on what indexes you have and how populated the column is (nullable or not). Consider this example.
create table tbl (id int identity primary key, a int ,b int,c int, d int)
insert tbl values(1,2,3,null)
insert tbl values(2,null,3,1)
insert tbl values(1,null,1,4)
insert tbl values(1,null,3,5)
insert tbl values(1,null,3,6)
insert tbl select a,b,c,d from tbl --10
insert tbl select a,b,c,d from tbl --20
insert tbl select a,b,c,d from tbl --40
insert tbl select a,b,c,d from tbl --80
insert tbl select a,b,c,d from tbl --160
insert tbl select a,b,c,d from tbl --320
insert tbl select a,b,c,d from tbl --640
insert tbl select a,b,c,d from tbl --1280
insert tbl select a,b,c,d from tbl --2560
insert tbl select a,b,c,d from tbl --5120
insert tbl select a,b,c,d from tbl --10240
Column b is created nullable and is only 20% non-null. Now, run your queries against the table (with no indexes). Before you run it, make sure to press Ctrl-M (show actual execution plan). Run both queries in the same batch, i.e. highlight the text of both queries and execute.
SELECT (SELECT SUM(a) from tbl where a=1) AS a ,
(SELECT SUM(b) from tbl where b=2) AS b ,
(SELECT SUM(c) from tbl where c=3) AS c
select sum((case when a=1 then a else null end)),
sum((case when b=2 then b else null end)),
sum((case when c=3 then c else null end))
from tbl
I won't bore you with images here but look at the plan which will show a cost of about 75% against the top query and 25% against the bottom. That's expected, 75%:25% = 3:1 which is due to the first query passing through the table 3 times exactly. Now create these three indexes:
create index ix_tbl_a on tbl(a)
create index ix_tbl_b on tbl(b)
create index ix_tbl_c on tbl(c)
Then, rerun the query batch (both together). This time, you'll see a cost of about 51% to 49%. Quite close. The reason is because the (b)
column being sparsely populated is very easy to SUM
from the index pages alone. Even the other 2 columns are helped by retrieving more rows per index page than the clustered index on the data pages (which will contain all columns).
When you expand this to 27 columns, the first form could run faster if each column is sparsely populated and if you have an index on each of the 27 columns. A big ask, and even then, it will probably only be very marginally faster.
回答2:
The second option makes a single pass of the table; the first one makes multiple passes. Performance-wise, the second option should be superior in most cases.
回答3:
It depends what indexes you have.
If a
, b
, and c
all are indexed then the original version could be significantly faster. Particularly if a large proportion of the table doesn't meet any of the criteria.
If you have no useful indexes at all then the choice is three scans vs one scan so the CASE
version should be faster.
来源:https://stackoverflow.com/questions/12875368/subquery-total-performance-vs-case-sum-performance