How would you go about proving that two queries are functionally equivalent, eg they will always both return the same result set.
As I had a specific query in
This sounds to me like a an NP complete problem. I'm not sure there is a sure fire way to prove this kind of thing
Perhaps you could draw (by hand) out your query and the results using Venn Diagrams, and see if they produce the same diagram. Venn diagrams are good for representing sets of data, and SQL queries work on sets of data. Drawing out a Venn Diagram might help you to visualize if 2 queries are functionally equivalent.
You don't.
If you need a high level of confidence that a performance change, for example, hasn't changed the output of a query then test the hell out it.
If you need a really high level of confidence .. then errrm, test it even more.
Massive level's of testing aren't that hard to cobble together for a SQL query. Write a proc which will iterate around a large/complete set of possible paramenters, and call each query with each set of params, and write the outputs to respective tables. Compare the two tables and there you have it.
It's not exactly scientific, which I guess was the OP's question, but I'm not aware of a formal method to prove equivalency.
The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.
For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+F Comparing the Contents of Two Tables):
http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html
Which boils down to:
select c1,c2,c3,
count(src1) CNT1,
count(src2) CNT2
from (select a.*,
1 src1,
to_number(null) src2
from a
union all
select b.*,
to_number(null) src1,
2 src2
from b
)
group by c1,c2,c3
having count(src1) <> count(src2);
CAREFUL! Functional "equivalence" is often based on the data, and you may "prove" equivalence of 2 queries by comparing results for many cases and still be wrong once the data changes in a certain way.
For example:
SQL> create table test_tabA
(
col1 number
)
Table created.
SQL> create table test_tabB
(
col1 number
)
Table created.
SQL> -- insert 1 row
SQL> insert into test_tabA values (1)
1 row created.
SQL> commit
Commit complete.
SQL> -- Not exists query:
SQL> select * from test_tabA a
where not exists
(select 'x' from test_tabB b
where b.col1 = a.col1)
COL1
----------
1
1 row selected.
SQL> -- Not IN query:
SQL> select * from test_tabA a
where col1 not in
(select col1
from test_tabB b)
COL1
----------
1
1 row selected.
-- THEY MUST BE THE SAME!!! (or maybe not...)
SQL> -- insert a NULL to test_tabB
SQL> insert into test_tabB values (null)
1 row created.
SQL> commit
Commit complete.
SQL> -- Not exists query:
SQL> select * from test_tabA a
where not exists
(select 'x' from test_tabB b
where b.col1 = a.col1)
COL1
----------
1
1 row selected.
SQL> -- Not IN query:
SQL> select * from test_tabA a
where col1 not in
(select col1
from test_tabB b)
**no rows selected.**
This will do the trick. If this query returns zero rows the two queries are returning the same results. As a bonus, it runs as a single query, so you don't have to worry about setting the isolation level so that the data doesn't change between two queries.
select * from ((<query 1> MINUS <query 2>) UNION ALL (<query 2> MINUS <query 1>))
Here's a handy shell script to do this:
#!/bin/sh
CONNSTR=$1
echo query 1, no semicolon, eof to end:; Q1=`cat`
echo query 2, no semicolon, eof to end:; Q2=`cat`
T="(($Q1 MINUS $Q2) UNION ALL ($Q2 MINUS $Q1));"
echo select 'count(*)' from $T | sqlplus -S -L $CONNSTR