问题
I am not able to understand how Cartesian product works. Consider the simple schema:
mysql> select * from account;
+----------------+-------------+---------+
| account_number | branch_name | balance |
+----------------+-------------+---------+
| A101 | Downtown | 500 |
| A102 | Perryridge | 400 |
| A201 | Brighton | 900 |
| A215 | Mianus | 700 |
| A217 | Brighton | 750 |
| A222 | Redwood | 700 |
| A305 | Round Hill | 350 |
+----------------+-------------+---------+
7 rows in set (0.00 sec)
Now, when I pose the query
select a.balance from account a, account b where a.balance<b.balance;
I get a series of values except the maximum value 900. Then using the not in
operator I determine the maximum value. Before that in the above query, when the join takes place based on the condition a.balance<b.balance
, the first tuple in the relation must be 500
. Theoretically, the first 5 values must be:
500
500
500
500
400
But I get :
+---------+
| balance |
+---------+
| 400 |
| 350 |
| 350 |
| 500 |
| 400 |
How is it working? I am using MySQL database.
回答1:
A Cartesian join joins every record in the first table with every record in the second table, so since your table has 7 rows and it's joined with itself, it should return 49 records had you not had a where
clause. Your where clause only allows records where a
's balance is smaller than b
's balance. Since 900
is, as you said, the maximal balance in the table, it will never be smaller than any other balance, and therefore it will never be returned.
With regard to the first five rows, the normal rules of SQL apply to joins too. Since SQL tables have no intrinsic order, it's completely up to the database to decide how to return them, unless you explicitly state an order in the order by
clause. The values you listed are perfectly valid values you'd expect the query to return.
回答2:
The Cartesian Product generates all possible combinations of records from two given sets of data.
In your case, to generate a Cartesian Product, you'd have to either use CROSS JOIN:
SELECT
a.branch_name AS first_branch,
b.branch_name AS second_branch,
a.balance + b.balance AS total_balance
FROM account a
CROSS JOIN account b
or, use the SQL:89 theta-style join:
SELECT
a.branch_name AS first_branch,
b.branch_name AS second_branch,
a.balance + b.balance AS total_balance
FROM account a, account b
Anyway, the goal of the Cartesian product would be to associate all rows of two sets.
The moment you apply some filtering criteria to the Cartesian product generated by a CROSS JOIN, the result will no longer be a Cartesian product, but a subset of it, that matches to the given filtering conditions.
So, in your case, this query:
SELECT
a.balance
FROM account a, account b
WHERE a.balance < b.balance
does not generate a Cartesian Product.
In fact, a much better alternative to your query is this one:
SELECT
a.balance
FROM account a
WHERE a.balance < (
SELECT MAX(balance) FROM account
)
If you want to get all rows that have a lower balance than the maximum one.
Anyway, the use of the self CROSS JOIN here looks suspicious. That's why you are probably better off using a subquery instead.
来源:https://stackoverflow.com/questions/30955338/understanding-cartesian-product-in-sql