Ok, so I have a query:
select distinct(a)
from mytable
where
b in (0,3)
What is going to be faster, the above or
select di
As far as I know, IN
converts to OR
. So the performance is the same. Just a shorter way of writing it.
Hopefully in this simple example it won't make any difference which version you use (as the query optimiser should turn them into equivalent queries under the hood), however there's a fair chance it's going to be dependent on the indexes you have on mytable
. I would suggest that you run both queries in Sql Server Management Studio after having turned on "Include Actual Execution Plan", and compare the results to determine which query has the lowest "cost" in your scenario.
To do this:
The bottom "results" half of the window will now have a 3rd tab showing, "Execution Plan" which should contain two "flowcharts", one for the first query and another for the second. If the two are identical, then Sql Server has treated the two queries as equivalent and therefore you should choose whichever form you and/or your colleagues prefer.
Both IN
and OR
will do a query for b = 0
followed by one for b = 3
, and then do a merge join on the two result sets, and finally filter out any duplicates.
With IN
, duplicates doesn't really make sense, because b
can't both be 0
and 3
, but the fact is that IN
will be converted to b = 0 OR b = 3
, and with OR
, duplicates do make sense, because you could have b = 0 OR a = 3
, and if you were to join the two separate result sets, you could end up with duplicates for each record that matched both criteria.
So a duplicate filtering will always be done, regardless of whether you're using IN
or OR
. However, if you know from the outset that you will not have any duplicates - which is usually the case when you're using IN
- then you can gain some performance by using UNION ALL
which doesn't filter out duplicates:
select distinct(a)
from mytable
where
b = 0
UNION ALL
select distinct(a)
from mytable
where
b = 3