问题
let's say that there're 2 tables (in oracle SQL) like this:
user(user_id, company_id):
123 | company_id_1 |
123 | company_id_2 |
company(id, version_id):
company_id_1 | (null) |
company_id_2 | version_id1 |
the following query returns 2 rows
company_id_1
company_id_2
SELECT distinct(company_id)
FROM user
WHERE user.user_id = 123
AND user.company_id IS NOT NULL
AND EXISTS
(SELECT 1
FROM company
INNER JOIN user ON company.id = user.company_id AND company.version_id IS NOT NULL);
I would expect there's only 1 result, which is company_id_2, but it returns 2 results (company_id_1, company_id_2)
A couple of other notes:
- the following query does return 1 result for me
SELECT distinct(company_id)
FROM user
WHERE user.user_id = 123
AND user.company_id IS NOT NULL
AND EXISTS
(SELECT 1
FROM company
WHERE company.id = user.company_id AND company.version_id IS NOT NULL);
- what's odd to me is the following statement (running the inner join individually) does return 1 result:
SELECT *
FROM company
INNER JOIN user ON company.id = user.company_id AND company.version_id IS NOT NULL
WHERE company.id IN (company_id_1, company_id_2)
- So why does query with inner join inside exists returns 2 results? even though by running the inner join individually it only returns 1 result, and exists condition should only be evaluated to true for only company_id_2, which has the not-null version_id
- Can you elaborate more on what's the difference between the inner join inside the exists vs the regular where clause inside exists, they both looks the same to me?
回答1:
The difference is that when you use exists
- the query inside usually depends on the main query (mean uses columns from it and so can't be executed separately) and, so, for each row of the main query it checks if some data retreived by the subquery exists or not.
The problem of your first query is that the subquery inside exists block doesn't anyhow depend on the main query columns, it's a separate query which always return a row with 1
, so, for any row of the main query the result of exists
will be always true
. So, your first query is just equivalent to
SELECT distinct(company_id)
FROM user
WHERE user.user_id = 123
AND user.company_id IS NOT NULL
See also fiddle
回答2:
Without the join, your filter using user.company_id
is correlated from outside the subquery. This means that for each row in your outer query, the subquery could return different results.
Joins in exists subqueries are nothing special but you uncorrelated the subquery with your join. It now can run completely independently of the outer query. The exists filter works exactly the same way but because there is no correlation with the outer query, it will either always be true or always be false.
回答3:
Query - 1 gives you two records because the query inside the EXISTS
is not a correlated sub-query and it becomes true for both the records of the user.user_id = 123
. Please note that the table inside the EXISTS
and the table outside (in the main query) are evaluated separately.
Your EXISTS
condition has no sense here as it will be true for any record as it individually returns one record.
SELECT distinct(company_id)
FROM user
WHERE user.user_id = 123
AND user.company_id IS NOT NULL
-- following will behave as an individual query
-- and has no relation will main query's user table
AND EXISTS
(SELECT 1
FROM company
INNER JOIN user ON company.id = user.company_id AND company.version_id IS NOT NULL);
Now, Comming to your second query. It is a correlated sub-query and EXISTS
becomes false for the user.company_id = 'company_id_2'
so it returns only one record
SELECT distinct(company_id)
FROM user
WHERE user.user_id = 123
AND user.company_id IS NOT NULL
-- in EXISTS condition user table is used which refers to the main query's user table
-- it is called the correlated sub-query
AND EXISTS
(SELECT 1
FROM company
WHERE company.id = user.company_id AND company.version_id IS NOT NULL);
回答4:
The WHERE
clause looks at one row at a time:
FROM user
WHERE ... EXISTS ( ... )
looks at a user row and checks whether there exists some data.
In your second query you check whether there exists a company version for the user:
SELECT 1
FROM company
WHERE company.id = user.company_id -- the company for the user we are looking at in the main query
AND company.version_id IS NOT NULL
This is how to use EXISTS
; the subquery looks for data related to the main query's row.
In your first query, however, your exists subquery is this:
SELECT 1
FROM company
INNER JOIN user ON company.id = user.company_id
AND company.version_id IS NOT NULL
You can run this stand-alone; it does hence not relate to the main query. What you ask here is: Does a user with a company version exists. The answer: Yes there exists such a user. This is true hence. True regardless on what you are dealing with in your main query. This is not how to use EXISTS
. It is extremely rare that we use a stand-alone EXISTS
clause, not related to the main query.
The third query merely joins the two tables and finds all rows matching all criteria. The join gives you this:
+--------------+-----------------+--------------------+--------------------+ | user.user_id | user.company_id | company.company_id | company.version_id | +--------------+-----------------+--------------------+--------------------+ | 123 | company_id_1 | company_id_1 | (null) | | 123 | company_id_2 | company_id_2 | version_id1 | +--------------+-----------------+--------------------+--------------------+
where only the second row matches your WHERE
clause. So, only the second row gets returned.
Another thing I'd like to mention is that when you are forced to use DISTINCT
, then ask yourself: what makes this necessary? How come there are duplicates you must remove? A normalized database usually doesn't contain duplicate data, so it's probably a weakness in the query that builds a too large intermediate result you must then reduce.
If looking for companies, select from companies:
select *
from company
where version_id is not null
and company_id in (select company_id from user where user_id = 123);
or with EXISTS
instead of IN
:
select *
from company
where version_id is not null
and exists
(
select null
from user
where user.company_id = company.company_id
and user.user_id = 123
);
来源:https://stackoverflow.com/questions/65033995/behavior-of-inner-join-inside-exists-sql