Why using OR condition instead of Union caused a performance Issue

喜夏-厌秋 提交于 2019-12-23 05:45:10

问题


Hi I have below query in an SP

@CrmContactId is a parameter to the SP.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId)  

we made a new change and introduced an OR condition

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where (b.CRMContactId = @CRMContactId OR C.CRMContactId = @CRMContactId) 

Execution plan:

Attachment : Execution Plan

but this change caused a huge performance issue in live server. TPolicyBusiness and TPolicyOwner are heavy tables having millions of records. TAdditionalOwner table is a light table with only few records.

To fix this issue Union all was instead of OR condition.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
        From
        (
            Select A.PolicyBusinessId, A.PolicyDetailId 
            from TPolicyBusiness A
            inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
            where b.CRMContactId = @CRMContactId
            union all
            Select A.PolicyBusinessId, A.PolicyDetailId 
            from TPolicyBusiness A
            Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
            where C.CRMContactId = @CRMContactId
        ) as A

Execution Plan:

Attachement Execution Plan

Could someone please explain why the introduction of OR caused an issue and why using Union is better than OR in this case?


回答1:


Using UNION ALL to replace OR is actually one of the well known optimization tricks. The best reference and explanation is in this article: Index Union.

The gist of it is that OR predicates that could be be satisfied by two index seeks cannot be reliably detected by the query optimizer (the reason being impossibility to predict the disjoint sets from the two sides of the OR). So when expressing the same condition as an UNION ALL then the optimizer has no problem creating a plan that does two short seeks and unions the results. The important thing is to realize that a=1 or b=2 can be different from a=1 union all b=2 because the first query returns rows that satisfy both conditions once, while the later returns them twice. When you write the query as UNION ALL you are telling the compiler that you understand that and you have no problem with it.

For further reference see How to analyse SQL Server performance.




回答2:


Query1, the left join and the where clause combine to mean that table C is effectively ignored (not referenced in select list either), so you basically have a 2 table inner join query.

Query 2, the left join in now almost an inner join because that table is referenced in the where clause without also permitting NULLs from that table - but due the OR, all conditions are being considered for the result. Hence a 3 table inner join query more or less.

Query 3, you have simplified the inner UNION ALL query to 2 simple inner joins

But I suspect there will be more to it because and A.IndigoClientId=@TenantId indicates you have not revealed the full queries anyway.




回答3:


The JOIN to TAdditionalOwner is using

TPolicyBusiness.PolicyBusinessId = TAdditionalOwner.PolicyBusinessId 

where the JOIN to TPolicyOwner is using the

TPolicyBusiness.PolicyDetailId = TPolicyOwner.PolicyDetailId

Check that there is a corresponding index for the PolicyBusinessId.

In the 2-way JOIN, that is part of the UNION, the smaller TAdditionalOwner table will be optimised if there is not an index for it to refer to in TPolicyBusiness, due to the small size. The server will still do a table scan, but use the values from the smaller table and see if they are in the big table somewhere. If there is no index, this optimisation will disappear quite quickly as the small table grows.

Given that you are not referring to either B or C in the SELECT you can simply to this

SELECT DISTINCT A.PolicyBusinessId, A.PolicyDetailId 
FROM TPolicyBusiness A
LEFT JOIN TPolicyOwner B ON a.PolicyDetailId = b.PolicyDetailId AND b.CRMContactId = @CRMContactId 
LEFT JOIN TAdditionalOwner C on c.PolicyBusinessId = A.PolicyBusinessId AND C.CRMContactId = @CRMContactId

This way it will JOIN to either table, same as per your UNION, but without the OUTER select.

Either way ensure that the fields used are indexed.




回答4:


Union operator, it will take lot of time if the data is huge.try to avoid union operator.

please try below try to filter the data while joining.it will be much better.

Select distinct A.PolicyBusinessId, A.PolicyDetailId 
from TPolicyBusiness A
inner join TPolicyOwner B on a.PolicyDetailId=b.PolicyDetailId
and b.CRMContactId = @CRMContactId 
Left Join TAdditionalOwner C on c.PolicyBusinessId=A.PolicyBusinessId
where  (1=1 OR C.CRMContactId = @CRMContactId) 


来源:https://stackoverflow.com/questions/25520758/why-using-or-condition-instead-of-union-caused-a-performance-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!