joins versus correlated exists subqueries

烂漫一生 提交于 2019-12-11 04:09:40

问题


In Derby and/or MySQL, I am wondering about the performance of:

select c0.documentid from contentblob as c0 where c0.documentid > ? 
  and c0.blobid = ? 
  and exists (select * from contentblob c1 where c1.documentid = c0.documentid 
              and c1.blobid = ?)
  order by c0.documentid

Note that DIFFERENT values will be supplied for the two blobid comparisons. What this query does is return the id's that occur in the table (at least) twice: once with blobid1 and once with blobid2.

compared to the equivalent self-join,

assuming, of course, indices on the documentid and blobid columns of the contentblob table.

In case it isn't clear, the relevant schema here is a table with two varchar columns, documentid and blobid. They are composed to form the primary key.

If there's nothing for it but to inquire of the query analyzer in the db, I'll go do that.


回答1:


Generally, correlated sub-queries are more expensive than joins, even outer self-joins, though in this case all you need is a regular inner self-join:

SELECT c0.documentid
  FROM contentblob AS c0
  JOIN contentblob AS c1 ON c1.documentid = c0.documentid
 WHERE c0.documentid > ? 
   AND c0.blobid = ?
   AND c1.blobid = ?
   AND c0.blobid < c1.blobid
 ORDER BY c0.documentid

The last condition avoids seeing two entries for a pair of rows with the same document ID and the relevant blob ID values; without it, you'd see the document ID twice, once for c0.blobid = v1 and once for c0.blobid = v2.

You might want to move the c1.blobid condition into the ON clause, but the optimizer should do that for you anyway. Also check that your optimizer is using an appropriate index. It isn't entirely clear what that index would be (but I think an index with blobid as the leading column would probably perform best), but if the tables are large and the optimizer is using a sequential scan of the table, you have problems.




回答2:


I think the left join would be clearer

  select c0.documentid from contentblob as c0 where c0.documentid > ? 
  and c0.blobid = ? 
  left join contentblob c1 on c0.documentid = c1.documentid 
       and c1.blobid = ? 
  order by c0.documentid

Mileage will vary for different database engines, and, will also vary with table sizes, index cardinality etc. but my gut feeling is the join would perform better.



来源:https://stackoverflow.com/questions/4457132/joins-versus-correlated-exists-subqueries

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!