Oracle: Fastest way to compare tables containing CLOB and get diff

徘徊边缘 提交于 2019-12-23 18:20:05

问题


Suppose I have two tables with columns, Col1, Col2 and Col3 which are VARCHAR2, CLOB and NUMBER types respectively.

How can I get the diff of these tables? (i.e The list of records that exist in the Table B, but not in the Table A)

Table A:
╔═══════╦═════════════════╦══════╗
║ Col1  ║      Col2       ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1234 ║
║ P1111 ║ some_long_text1 ║ 1233 ║
║ P1111 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝

Table B:
╔═══════╦═════════════════╦══════╗
║ Col1  ║      Col2       ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1234 ║
║ P1111 ║ some_long_text1 ║ 1235 ║
║ P1112 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝

Expected results:
╔═══════╦═════════════════╦══════╗
║ Col1  ║      Col2       ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1235 ║
║ P1112 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝

回答1:


To compare LOB types you can use DBMS_LOB.COMPARE function.

SELECT table_b.* 
  FROM table_b
  LEFT JOIN table_a
    ON table_b.col1 = table_a.col1
   AND DBMS_LOB.COMPARE(table_b.col2, table_a.col2) = 0
   AND table_b.col3 = table_a.col3
 WHERE table_a.col1 IS NULL;



回答2:


You can use dbms_lob.substr() function as shown below to use the minus operator

select co11,dbms_lob.substr(col2),col3 from a
minus
select co11,dbms_lob.substr(col2),col3 from b;



回答3:


In Oracle specifically I think you can do this

SELECT * FROM TableB WHERE (Col1, Col2, Col3) NOT IN (SELECT Col1, Col2, Col3 from TABLEA)

Other DBS you'd join:

SELECT * FROM TableB left outer join TableA 
on (a.Col1=b.Col1 and a.Col2=b.Col2 and a.Col3=b.Col3)
WHERE a.col1 is null

You'd probably need to do a checksum/hash on the CLOB column though if it's to be included in the comparison.




回答4:


Based on your expected results, it seems that the a Left Join will work well.

Something to the effect of this:

Select B.Col1
      ,B.Col2
      ,B,Col3
  FROM TableB B
  LEFT OUTER JOIN TableA A
    ON B.Col1 = A.Col1
 WHERE B.Col2 = A.Col2
   AND B.Col3 = A.Col3
   AND A.Col1 IS NULL

Since the WHERE condition filters the set of data, it makes sense to get match everything, including the values that are NULL matched to table A. By only including the A.Col1 IS NULL in the WHERE then you're sure to only see the values from TableB for which there is no corresponding TableA value.

Joins are expensive, and Left Joins are even more so. Joining on only one key value should help the efficiency (especially since you're wanting to match all records anyway). By placing other join predicates in the WHERE clause, you can the filter on those.

Now - as far as matching on your CLOB - there may or may not be any benefit to hashing those values. This depends on the size of the data, and the hashing algorithm used. The Oracle Optimizer may automatically choose to hash the column for comparison or you can force it by using a function.

It is my opinion that the engine should be allowed to make this choice - and I am sure that there will be others who disagree with me and they will all have valid reasons. My contention is this: Why force an extra step if it's not necessary, when the optimizer can make this decision on it's own?

If this is going to be a heavily utilized query (as a store procedure, for example) that will be called often then there may be benefit to creating a column that stores a pre-computed hash of the CLOB for easy comparison. This change would almost certainly eliminate the overhead of requiring every CLOB to be hashed during execution which can be a very CPU intensive operation. I personally wouldn't recommend indexing the hashed column as I expect that each CLOB entry is likely a unique value. If that's the case, then the PK of the table should be sufficient enough for matching based on row uniqueness.




回答5:


1) You have to create UDT for lob objects.

create or replace type lob_wrapper is object 
( x clob,
  hash varchar2(100),
 constructor function lob_wrapper(p_x clob)  return self as result,
 MAP MEMBER FUNCTION get_hash  RETURN varchar2

 )
 ;

create  or replace type body lob_wrapper  
as 
 constructor function lob_wrapper(p_x clob)
    return self as result
  as
   temp_ varchar(1000) :=  p_x;
  begin 
       self.x := p_x;
-- add here better implementation of hashing clob. 
       select ora_hash(temp_) into self.hash from dual;   
   return;
  end; 
 MAP MEMBER FUNCTION get_hash  RETURN varchar2 is 
   v_hash varchar2(4000);
 begin    
  return hash;
 end;
end;

Object construct calculate hash for clob. In example i'm using ora_hash but you should choose better solution (dbms_crypto.hash).

Map function get_hash in object is invoked when db try to compare two object.

select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus 
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a

To obtain original value from object add another select.

select col1,t.col2.x oringal_value,col3, t.col2.hash hash_value from (
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus 
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a
) t;


来源:https://stackoverflow.com/questions/43611392/oracle-fastest-way-to-compare-tables-containing-clob-and-get-diff

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!