How can I find out which rows have been deleted?

问题

Informix-SQL 7.32 with SE engine:

I have a customer who deleted several rows from an SE table. (I'm not using transaction logging or audit). The table has a serial column. I would like to create an Ace report to print the missing serial columns. I tried the following quick and dirty report, but it didn't work!.. can you suggest a better way?

define
variable next_id integer
end

  select tbl_id 
    from tbl
order by tbl_id {I'm ordering tbl_id because all the rows are periodically re-clustered}
     end        {by an fk_id in order to group all rows belonging to the same customer}

format
on every row
let next_id = tbl_id + 1  

after group of tbl_id
if tbl_id + 1 <> next_id then
print column 1, tbl_id + 1 using "######"

end

or maybe create a temporary table with an INT column containing sequential numbers from 1 to 5000 and do a select statement like:

   SELECT tbl_id 
     FROM tbl
    WHERE tbl_id NOT IN
                 (SELECT tmp_int
                    FROM tmp);

or a select statement with HAVING, OUTER, etc.

回答1:

Since this is SE, we have to use the old-fashioned notation, not the SQL-92 JOIN notations.

The four queries that follow are a common foundation for the two possible answers:

SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
  FROM tbl AS t1, OUTER tbl AS t2
 WHERE t1.tbl_id + 1 = t2.tbl_id
  INTO TEMP x1;

SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
  FROM tbl AS t1, OUTER tbl AS t2
 WHERE t1.tbl_id - 1 = t2.tbl_id
  INTO TEMP x2;

SELECT tbl_id AS hi_range
  FROM x1
 WHERE ind IS NULL
  INTO TEMP x3;

SELECT tbl_id AS lo_range
  FROM x2
 WHERE ind IS NULL
  INTO TEMP x4;

The tables x3 and x4 now contain (respectively) the values for tbl_id that have no immediate successor and no immediate predecessor. Each value is the start or end of a contiguous ranges of tbl_id values. In IDS instead of SE, you could use the standard SQL OUTER JOIN notation and filter the results of the join in two queries instead of four; you do not have that luxury in SE.

Non-solution with quadratic (or worse) behaviour

Now you just have to work out how to combine the two tables:

SELECT t1.lo_range, t2.hi_range
  FROM x4 AS t1, x3 AS t2
 WHERE t1.lo_range <= t2.hi_range
   AND NOT EXISTS
       (SELECT t3.lo_range, t4.hi_range
          FROM x4 AS t3, x3 AS t4
         WHERE t3.lo_range <= t4.hi_range
           AND t1.lo_range =  t3.lo_range
           AND t2.hi_range >  t4.hi_range
       );

The main part of this query occurs twice and generates all pairs of rows where the start of the range is less than or equal to the end of the range (equal allows for 'ranges' consisting of one value on its own, with deleted rows on either side). The NOT EXISTS clause ensures that there is no other pair with the same start value and a smaller end value.

The queries on the temp tables may not be very fast if there are many gaps in the data; if there are very few gaps, then they should be OK.

The last query exhibits quadratic behaviour in terms of the number of ranges. When I had just a dozen ranges, it was fine (sub-second response time); when I had 1,200 ranges, it was not OK - did not complete in a reasonable time.

Avoiding quadratic behaviour

Since quadratic behaviour is not good, how can we rephrase the query...

For each low end of the range, find the minimum high end of a range that is greater than or equal to the low end, or in SQL:

SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
  FROM x4 AS t1, x3 AS t2
 WHERE t2.hi_range >= t1.lo_range
 GROUP BY t1.lo_range;

Note that this can easily be incorporated into an ACE report. It gives you the ranges of number present - not those which are absent. You can work out how to generate the other.

Timing

That performed pretty well on a table with 22100 rows containing 1200 gaps in the data. Using (my) SQLCMD program in its benchmark mode (-B), and sending SELECT output to /dev/null, and using IDS 11.70.FC1 run on MacOS X 10.6.7 (MacBook Pro, Intel Core 2 Duo at 3 GHz and 4 GB RAM), the results were:

$ sqlcmd -d stores -B -f gaps.sql
+ CLOCK START;
2011-03-31 18:44:39
+ BEGIN;
Time: 0.000588
2011-03-31 18:44:39
+ SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
  FROM tbl AS t1, OUTER tbl AS t2
 WHERE t1.tbl_id + 1 = t2.tbl_id
  INTO TEMP x1;
Time: 0.437521
2011-03-31 18:44:39
+ SELECT t1.tbl_id AS tbl_id, t2.tbl_id AS ind
   FROM tbl AS t1, OUTER tbl AS t2
  WHERE t1.tbl_id - 1 = t2.tbl_id
   INTO TEMP x2;
Time: 0.315050
2011-03-31 18:44:39
+ SELECT tbl_id AS hi_range
  FROM x1
 WHERE ind IS NULL
  INTO TEMP x3;
Time: 0.012510
2011-03-31 18:44:39
+ SELECT tbl_id AS lo_range
  FROM x2
 WHERE ind IS NULL
  INTO TEMP x4;
Time: 0.008754
+ output "/dev/null";
2011-03-31 18:44:39
+ SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
  FROM x4 AS t1, x3 AS t2
 WHERE t2.hi_range >= t1.lo_range
 GROUP BY t1.lo_range;
Time: 0.561935
+ output "/dev/stdout";
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x1;
22100
Time: 0.001171
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x2;
22100
Time: 0.000685
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x3;
1200
Time: 0.000590
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x4;
1200
Time: 0.000768
2011-03-31 18:44:40
+ SELECT t1.lo_range, MIN(t2.hi_range) AS hi_range
  FROM x4 AS t1, x3 AS t2
 WHERE t2.hi_range >= t1.lo_range
 GROUP BY t1.lo_range
 INTO TEMP x5;
Time: 0.529420
2011-03-31 18:44:40
+ SELECT COUNT(*) FROM x5;
1200
Time: 0.001155
2011-03-31 18:44:40
+ ROLLBACK;
Time: 0.329379
+ CLOCK STOP;
Time: 2.202523
$

It will do; less than a couple of seconds processing time.

回答2:

see: Is there an SQL function which generates a given range of sequential numbers? for a simpler and more engine-efficient solution.

来源：https://stackoverflow.com/questions/5482528/how-can-i-find-out-which-rows-have-been-deleted

标签

sql

informix

auto-increment