Proving SQL query equivalency

后端 未结 9 2394
一整个雨季
一整个雨季 2020-12-15 05:33

How would you go about proving that two queries are functionally equivalent, eg they will always both return the same result set.


As I had a specific query in

相关标签:
9条回答
  • 2020-12-15 05:57

    This sounds to me like a an NP complete problem. I'm not sure there is a sure fire way to prove this kind of thing

    0 讨论(0)
  • 2020-12-15 05:58

    Perhaps you could draw (by hand) out your query and the results using Venn Diagrams, and see if they produce the same diagram. Venn diagrams are good for representing sets of data, and SQL queries work on sets of data. Drawing out a Venn Diagram might help you to visualize if 2 queries are functionally equivalent.

    0 讨论(0)
  • 2020-12-15 06:00

    You don't.

    If you need a high level of confidence that a performance change, for example, hasn't changed the output of a query then test the hell out it.

    If you need a really high level of confidence .. then errrm, test it even more.

    Massive level's of testing aren't that hard to cobble together for a SQL query. Write a proc which will iterate around a large/complete set of possible paramenters, and call each query with each set of params, and write the outputs to respective tables. Compare the two tables and there you have it.

    It's not exactly scientific, which I guess was the OP's question, but I'm not aware of a formal method to prove equivalency.

    0 讨论(0)
  • 2020-12-15 06:01

    The best you can do is compare the 2 query outputs based on a given set of inputs looking for any differences. To say that they will always return the same results for all inputs really depends on the data.

    For Oracle one of the better if not best approaches (very efficient) is here (Ctrl+F Comparing the Contents of Two Tables):
    http://www.oracle.com/technetwork/issue-archive/2005/05-jan/o15asktom-084959.html

    Which boils down to:

    select c1,c2,c3, 
           count(src1) CNT1, 
           count(src2) CNT2
      from (select a.*, 
                   1 src1, 
                   to_number(null) src2 
              from a
            union all
            select b.*, 
                   to_number(null) src1, 
                   2 src2 
              from b
           )
    group by c1,c2,c3
    having count(src1) <> count(src2);
    
    0 讨论(0)
  • 2020-12-15 06:01

    CAREFUL! Functional "equivalence" is often based on the data, and you may "prove" equivalence of 2 queries by comparing results for many cases and still be wrong once the data changes in a certain way.

    For example:

    SQL> create table test_tabA
    (
    col1 number
    )
    
    Table created.
    
    SQL> create table test_tabB
    (
    col1 number
    )
    
    Table created.
    
    SQL> -- insert 1 row
    
    SQL> insert into test_tabA values (1)
    
    1 row created.
    
    SQL> commit
    
    Commit complete.
    
    SQL> -- Not exists query:
    
    SQL> select * from test_tabA a
    where not exists
    (select 'x' from test_tabB b
    where b.col1 = a.col1)
    
          COL1
    
    ----------
    
             1
    
    1 row selected.
    
    SQL> -- Not IN query:
    
    SQL> select * from test_tabA a
    where col1 not in
    (select col1
    from test_tabB b)
    
          COL1
    
    ----------
    
             1
    
    1 row selected.
    
    
    -- THEY MUST BE THE SAME!!! (or maybe not...)
    
    
    SQL> -- insert a NULL to test_tabB
    
    SQL> insert into test_tabB values (null)
    
    1 row created.
    
    SQL> commit
    
    Commit complete.
    
    SQL> -- Not exists query:
    
    SQL> select * from test_tabA a
    where not exists
    (select 'x' from test_tabB b
    where b.col1 = a.col1)
    
    
          COL1
    
    ----------
    
             1
    
    1 row selected.
    
    SQL> -- Not IN query:
    
    SQL> select * from test_tabA a
    where col1 not in
    (select col1
    from test_tabB b)
    
    **no rows selected.**
    
    0 讨论(0)
  • 2020-12-15 06:04

    This will do the trick. If this query returns zero rows the two queries are returning the same results. As a bonus, it runs as a single query, so you don't have to worry about setting the isolation level so that the data doesn't change between two queries.

    select * from ((<query 1> MINUS <query 2>) UNION ALL (<query 2> MINUS <query 1>))
    

    Here's a handy shell script to do this:

    #!/bin/sh
    
    CONNSTR=$1
    echo query 1, no semicolon, eof to end:; Q1=`cat` 
    echo query 2, no semicolon, eof to end:; Q2=`cat`
    
    T="(($Q1 MINUS $Q2) UNION ALL ($Q2 MINUS $Q1));"
    
    echo select 'count(*)' from $T | sqlplus -S -L $CONNSTR
    
    0 讨论(0)
提交回复
热议问题