How would you go about proving that two queries are functionally equivalent, eg they will always both return the same result set.
As I had a specific query in
This is pretty easy to do.
Lets assume your queries are named a and b
a minus b
should give you an empty set. If it does not. then the queries return different sets, and the result set shows you the rows that are different.
then do
b minus a
that should give you an empty set. If it does, then the queries do return the same sets. if it is not empty, then the queries are different in some respect, and the result set shows you the rows that are different.
1) Real equivalency proof with Cosette:
Cosette checks (with a proof) if 2 SQL query's are equivalent and counter examples when not equivalent. It's the only way to be absolutely sure, well almost ;) You can even throw in 2 query's on their website and check (formal) equivalence right away.
Link to Cosette: http://cosette.cs.washington.edu/
Link to article that gives a good explanation of how Cosette works: https://medium.com/@uwdb/introducing-cosette-527898504bd6
2) Or if you're just looking for a quick practical fix:
Try this stackoverflow answer: [sql - check if two select's are equal]
Which comes down to:
(select * from query1 MINUS select * from query2)
UNION ALL
(select * from query2 MINUS select * from query1)
This query gives you all rows that are returned by only one of the queries.
The DBMS vendors have been working on this for a very, very long time. As Rik said, it's probably an intractable problem, but I don't think any formal analysis on the NP-completeness of the problem space has been done.
However, your best bet is to leverage your DBMS as much as possible. All DBMS systems translate SQL into some sort of query plan. You can use this query plan, which is an abstracted version of the query, as a good starting point (the DBMS will do LOTS of optimization, flattening queries into more workable models).
NOTE: modern DBMS use a "cost-based" analyzer which is non-deterministic across statistics updates, so the query planner, over time, may change the query plan for identical queries.
In Oracle (depending on your version), you can tell the optimizer to switch from the cost based analyzer to the deterministic rule based analyzer (this will simplify plan analysis) with a SQL hint, e.g.
SELECT /*+RULE*/ FROM yourtable
The rule-based optimizer has been deprecated since 8i but it still hangs around even thru 10g (I don't know 'bout 11). However, the rule-based analyzer is much less sophisticated: the error rate potentially is much higher.
For further reading of a more generic nature, IBM has been fairly prolific with their query-optimization patents. This one here on a method for converting SQL to an "abstract plan" is a good starting point: http://www.patentstorm.us/patents/7333981.html