问题
Suppose I have two tables with columns, Col1, Col2 and Col3 which are VARCHAR2, CLOB and NUMBER types respectively.
How can I get the diff of these tables? (i.e The list of records that exist in the Table B, but not in the Table A)
Table A:
╔═══════╦═════════════════╦══════╗
║ Col1 ║ Col2 ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1234 ║
║ P1111 ║ some_long_text1 ║ 1233 ║
║ P1111 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝
Table B:
╔═══════╦═════════════════╦══════╗
║ Col1 ║ Col2 ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1234 ║
║ P1111 ║ some_long_text1 ║ 1235 ║
║ P1112 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝
Expected results:
╔═══════╦═════════════════╦══════╗
║ Col1 ║ Col2 ║ Col3 ║
╠═══════╬═════════════════╬══════╣
║ P1111 ║ some_long_text1 ║ 1235 ║
║ P1112 ║ some_long_text2 ║ 1233 ║
╚═══════╩═════════════════╩══════╝
回答1:
To compare LOB
types you can use DBMS_LOB.COMPARE
function.
SELECT table_b.*
FROM table_b
LEFT JOIN table_a
ON table_b.col1 = table_a.col1
AND DBMS_LOB.COMPARE(table_b.col2, table_a.col2) = 0
AND table_b.col3 = table_a.col3
WHERE table_a.col1 IS NULL;
回答2:
You can use dbms_lob.substr() function as shown below to use the minus operator
select co11,dbms_lob.substr(col2),col3 from a
minus
select co11,dbms_lob.substr(col2),col3 from b;
回答3:
In Oracle specifically I think you can do this
SELECT * FROM TableB WHERE (Col1, Col2, Col3) NOT IN (SELECT Col1, Col2, Col3 from TABLEA)
Other DBS you'd join:
SELECT * FROM TableB left outer join TableA
on (a.Col1=b.Col1 and a.Col2=b.Col2 and a.Col3=b.Col3)
WHERE a.col1 is null
You'd probably need to do a checksum/hash on the CLOB column though if it's to be included in the comparison.
回答4:
Based on your expected results, it seems that the a Left Join will work well.
Something to the effect of this:
Select B.Col1
,B.Col2
,B,Col3
FROM TableB B
LEFT OUTER JOIN TableA A
ON B.Col1 = A.Col1
WHERE B.Col2 = A.Col2
AND B.Col3 = A.Col3
AND A.Col1 IS NULL
Since the WHERE
condition filters the set of data, it makes sense to get match everything, including the values that are NULL
matched to table A. By only including the A.Col1 IS NULL
in the WHERE
then you're sure to only see the values from TableB for which there is no corresponding TableA value.
Joins are expensive, and Left Joins are even more so. Joining on only one key value should help the efficiency (especially since you're wanting to match all records anyway). By placing other join predicates in the WHERE
clause, you can the filter on those.
Now - as far as matching on your CLOB
- there may or may not be any benefit to hashing those values. This depends on the size of the data, and the hashing algorithm used. The Oracle Optimizer may automatically choose to hash the column for comparison or you can force it by using a function.
It is my opinion that the engine should be allowed to make this choice - and I am sure that there will be others who disagree with me and they will all have valid reasons. My contention is this: Why force an extra step if it's not necessary, when the optimizer can make this decision on it's own?
If this is going to be a heavily utilized query (as a store procedure, for example) that will be called often then there may be benefit to creating a column that stores a pre-computed hash of the CLOB
for easy comparison. This change would almost certainly eliminate the overhead of requiring every CLOB to be hashed during execution which can be a very CPU intensive operation. I personally wouldn't recommend indexing the hashed column as I expect that each CLOB entry is likely a unique value. If that's the case, then the PK of the table should be sufficient enough for matching based on row uniqueness.
回答5:
1) You have to create UDT for lob objects.
create or replace type lob_wrapper is object
( x clob,
hash varchar2(100),
constructor function lob_wrapper(p_x clob) return self as result,
MAP MEMBER FUNCTION get_hash RETURN varchar2
)
;
create or replace type body lob_wrapper
as
constructor function lob_wrapper(p_x clob)
return self as result
as
temp_ varchar(1000) := p_x;
begin
self.x := p_x;
-- add here better implementation of hashing clob.
select ora_hash(temp_) into self.hash from dual;
return;
end;
MAP MEMBER FUNCTION get_hash RETURN varchar2 is
v_hash varchar2(4000);
begin
return hash;
end;
end;
Object construct calculate hash for clob. In example i'm using ora_hash
but you should choose better solution (dbms_crypto.hash
).
Map function get_hash
in object is invoked when db try to compare two object.
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a
To obtain original value from object add another select.
select col1,t.col2.x oringal_value,col3, t.col2.hash hash_value from (
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_b
minus
select col1,lob_wrapper(col2) col2 ,col3 from test_clob_a
) t;
来源:https://stackoverflow.com/questions/43611392/oracle-fastest-way-to-compare-tables-containing-clob-and-get-diff