How do I represent a subquery in relation algebra? Do I put the new select under the previous select condition?
SELECT number
FROM collection
WHERE number =
The answer depends on which operators your algebra comprises. A semi-join operator would be most useful here.
If the common attribute was named number
in both relations then it would be a semi-join followed by projection of number
. Assuming a sem-join operator named MATCHING
, as per Tutorial D:
( collection MATCHING anotherStack ) { number }
As posted, the attribute needs to be renamed first:
( collection MATCHING ( anotherStack RENAME { anotherNumber AS number } ) { number }
If Standard SQL's (SQL-92) JOIN
can be considered, loosely speaking, a relational operator then it is true that SQL has no no semi-join. However, it has several comparison predicates that may be used to write a semi-join operator e.g. MATCH
:
SELECT number
FROM collection
WHERE MATCH (
SELECT *
FROM collection
WHERE collection.number = anotherNumber.anotherStack
);
However, MATCH
is not widely supported in real life SQL products, hence why a semi-join is commonly written using IN (subquery)
or EXISTS (subquery)
(and I suspect that's why you name-checked "subquery" in your question i.e. the term semi-join is not well known among SQL practitioners).
Another approach would be to use an intersect operator if available.
Something like (pseudocode):
( collection project number )
intersect
( ( anotherStack rename anotherNumber as number ) project number )
In SQL:
SELECT number
FROM collection
INTERSECT
SELECT anotherNumber
FROM anotherStack;
This is quite well supported in real life (SQL Server, Oracle, PostgreSQL, etc but notably not MySQL).
You would just rewrite that as a join
.
I'm not sure how widely used the syntax I learned for Relational Algebra is so in words.
anotherNumber
from anotherStack
anotherNumber
from the result of step 1 as number
collection
number
from the result of step 3According to this pdf, you can convert a sub-query easily to a relational algebric expression.
Firstly, you have to convert the whole query from the form
SELECT Select-list FROM R1 T1, R2 T2, ...
WHERE
some-column = (
SELECT some-column-from-sub-query from r1 t1, r2 t2, ...
WHERE extra-where-clause-if-needed)
to
SELECT Select-list FROM R1 T1, R2 T2, ...
WHERE
EXISTS (
SELECT some-column-from-sub-query from r1 t1, r2 t2, ...
WHERE extra-where-clause-if-needed and some-column = some-column-from-sub-query)
Then you have to convert the sub-query first into relational algebra. To do this for the sub-query given above:
PI[some-column-from-sub-query](
SIGMA[extra-where-clause-if-needed
^ some-column = some-column-from-sub-query
](RO[T1](R1) x RO[T2](R2) x ... x RO[t1](r1) x RO[t2](r2) x ...)
)
Here R1, R2...
are the contextual relations, and r1, r2...
are sub-query relations.
As the syntax is pretty disaster in stack overflow, please head over to that pdf to get a broad overview of how to convert sub query to relational algebra.