问题
Hi I have below query in an SP
@CrmContactId is a parameter to the SP.
Select distinct A.PolicyBusinessId, A.PolicyDetailId
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId)
we made a new change and introduced an OR condition
Select distinct A.PolicyBusinessId, A.PolicyDetailId
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId OR C.CRMContactId = @CRMContactId)
Execution plan:
Attachment : Execution Plan
but this change caused a huge performance issue in live server. TPolicyBusiness and TPolicyOwner are heavy tables having millions of records. TAdditionalOwner table is a light table with only few records.
To fix this issue Union all was instead of OR condition.
Select distinct A.PolicyBusinessId, A.PolicyDetailId
From
(
Select A.PolicyBusinessId, A.PolicyDetailId
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
where b.CRMContactId = @CRMContactId
union all
Select A.PolicyBusinessId, A.PolicyDetailId
from TPolicyBusiness A
Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where C.CRMContactId = @CRMContactId
) as A
Execution Plan:
Attachement Execution Plan
Could someone please explain why the introduction of OR caused an issue and why using Union is better than OR in this case?
回答1:
Using UNION ALL
to replace OR
is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.
The gist of it is that OR
predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2
can be different from a=1 union all b=2
because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.
For further reference see How to analyse SQL Server performance.
回答2:
Query1, the left join and the where clause combine to mean that table C is effectively ignored (not referenced in select list either), so you basically have a 2 table inner join query.
Query 2, the left join in now almost an inner join because that table is referenced in the where clause without also permitting NULLs from that table - but due the OR, all conditions are being considered for the result. Hence a 3 table inner join query more or less.
Query 3, you have simplified the inner UNION ALL query to 2 simple inner joins
But I suspect there will be more to it because and A.IndigoClientId=@TenantId
indicates you have not revealed the full queries anyway.
回答3:
The JOIN to TAdditionalOwner is using
TPolicyBusiness.PolicyBusinessId = TAdditionalOwner.PolicyBusinessId
where the JOIN to TPolicyOwner is using the
TPolicyBusiness.PolicyDetailId = TPolicyOwner.PolicyDetailId
Check that there is a corresponding index for the PolicyBusinessId.
In the 2-way JOIN, that is part of the UNION, the smaller TAdditionalOwner table will be optimised if there is not an index for it to refer to in TPolicyBusiness, due to the small size. The server will still do a table scan, but use the values from the smaller table and see if they are in the big table somewhere. If there is no index, this optimisation will disappear quite quickly as the small table grows.
Given that you are not referring to either B or C in the SELECT you can simply to this
SELECT DISTINCT A.PolicyBusinessId, A.PolicyDetailId
FROM TPolicyBusiness A
LEFT JOIN TPolicyOwner B ON a.PolicyDetailId = b.PolicyDetailId AND b.CRMContactId = @CRMContactId
LEFT JOIN TAdditionalOwner C on c.PolicyBusinessId = A.PolicyBusinessId AND C.CRMContactId = @CRMContactId
This way it will JOIN to either table, same as per your UNION, but without the OUTER select.
Either way ensure that the fields used are indexed.
回答4:
Union operator, it will take lot of time if the data is huge.try to avoid union operator.
please try below try to filter the data while joining.it will be much better.
Select distinct A.PolicyBusinessId, A.PolicyDetailId
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
and b.CRMContactId = @CRMContactId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (1=1 OR C.CRMContactId = @CRMContactId)
来源:https://stackoverflow.com/questions/25520758/why-using-or-condition-instead-of-union-caused-a-performance-issue