Subqueries vs Inner joins - Which one executes faster?

问题

I hope it's ok to make a posting like this.

I have been using SQL for quite some time and people at work have been using 2 different ways to return the same number or rows in a database.

For example:

SELECT Name
FROM
    Employees
WHERE
    DepartmentID IN (SELECT DepartmentID
                         FROM
                             Departments
                         WHERE
                             Department LIKE '%Engineering')

SELECT Employees.Name
FROM
    Departments
    INNER JOIN Employees
        ON Departments.DepartmentID = Employees.DepartmentID
WHERE
    Departments.Department LIKE '%Engineering'

Both return the same data. People have been telling me that using subqueries is the best way to do it.

My question is this: Which of these 2 will execute faster? My guess would be the one with the inner join but I may be wrong.

Thanks.

回答1:

In this case, both are equivalent. However, for the one with the INNER JOIN, if there is more than 1 department selected with the same DepartmentID, employees with that DepartmentID will be returned multiple times.

SQL is a declarative language, which means that the language is not supposed to say how the query should be performed, only what result should be found. It is up to the DMBS to work out how to perform it.

A decent SQL database will probably optimize them to both do the same or similar things.

To check if they are doing the same thing, run EXPLAIN on the query.

If they have the same steps, obviously they will take the same amount of time to run. Otherwise, you will see what ways the database is treating the differently. How the DBMS optimizes it is implementation dependent. So the best way is to use EXPLAIN.

Note: EXPLAIN is an SQL command, just like SELECT etc. See http://www.sql.org/sql-database/postgresql/manual/sql-explain.html.

Assuming it executes with the same steps, the way you write it will be the way you think appears clearest what the intention is.

回答2:

These two queries are not equivalent. The equivalent to the "IN" query is the following:

SELECT Name
FROM Employees e join
     (SELECT distinct DepartmentID
      FROM Departments
      WHERE Department LIKE '%Engineering'
     ) d
     on e.DepartmentID = d.DepartmentId

In other words, the "IN" version can never return duplicate rows. The join version can return duplicate rows. In this case, I don't think it would because departmentid is probably unique in the table.

The next question is about optimization. SQL engines really should optimize the two versions equivalently. That means that the "IN" version is easier to read.

However, there is one big exception to this. MySQL is definitely suboptimal when it comes to IN. When using that database engine, it is much better to use the join version.

By the way, there are other ways to do this, such as:

select name
from employees e
where exists (select 1 from Departments d
              where e.Departmentid = d.DepartmentId and
                    d.Department LIKE '%Engineering'
             )

Which works best . . . you need to measure on whatever database you are using.

回答3:

The query optimizers in most modern RDBMS will probably internally work with them the same way, so there won't be a performance difference.

Older RDBMS might probably show differences, but this depends on the RDBMS itself.

来源：https://stackoverflow.com/questions/12063008/subqueries-vs-inner-joins-which-one-executes-faster

标签

sql

subquery