NOT IN subquery fails when there are NULL-valued results

旧城冷巷雨未停 提交于 2019-12-06 01:11:47

It is better to use NOT EXISTS anyway:

WHERE NOT EXISTS(
    SELECT 1 FROM protocol_application_log_devl pal
    WHERE pal.person_id = person_id
     AND  pal.set_id = @set_id
)

Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?

A pattern I see quite a bit, and wish that I didn't, is NOT IN. When I see this pattern, I cringe. But not for performance reasons – after all, it creates a decent enough plan in this case:

The main problem is that the results can be surprising if the target column is NULLable (SQL Server processes this as a left anti semi join, but can't reliably tell you if a NULL on the right side is equal to – or not equal to – the reference on the left side). Also, optimization can behave differently if the column is NULLable, even if it doesn't actually contain any NULL values

Instead of NOT IN, use a correlated NOT EXISTS for this query pattern. Always. Other methods may rival it in terms of performance, when all other variables are the same, but all of the other methods introduce either performance problems or other challenges.

While I support Tim's answer as being correct-in-practice (NOT IN is not appropriate here), this is an interesting case noted in the IN / NOT IN documentation:

Caution: Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results1.

This is why the isnull "fixes" the problem - it masks any such NULL values and avoids the unexpected behavior. With that in mind, the following approach would also work (but please heed the advice about not using NOT IN to begin with):

person_id not in (
  SELECT distinct person_id
  FROM protocol_application_log_devl pal
  WHERE pal.set_id = @set_id
    AND person_id NOT NULL    -- guard here
)

However, a NULL person_id is suspicious and might indicate other issues ..

1 Here is the Proof pudding:

select case when 1 not in (2)       then 1 else 0 end as r1,
       case when 1 not in (2, NULL) then 1 else 0 end as r2
-- r1: 1, r2: 0
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!