问题
I have two tables one with 212,000 records (deprecated records) and the other with 10,500,000 records
I would like to join the two tables on id and version_number fields as both tables have these fields. I was hoping that from the joined table that the matched records (from the joined tables) could be deleted i.e all of the 212,000 records get deleted from the 10,500,000
I was wondering what the best approach would be for this using Oracle SQL? I have seen example where inner join has been used using a single field and a delete statement has been used to delete table1 from table 2 but not seen one with two fields used (in the join).
Would it make sense to use an outer join before deleting the records? I was thinking this may help me track what has been deleted if possible
回答1:
You do not need to use OUTER JOIN
except for the check how many rows will resp. will not be deleted.
An example of such query see below (I use generated test data provided at the end of the answer)
with del as (
select delta.id, delta.version,
decode(big.id,null,0,1) is_deleted
from delta
left outer join big
on delta.id = big.id and delta.version = big.version
)
select is_deleted, count(*) cnt, max(id||'.'||version) eg_id_vers
from del
group by is_deleted;
IS_DELETED CNT EG_ID_VERS
---------- ---------- ----------
1 20000 99995.0
0 20 100100.0
With your data size you should use a HASH JOIN
with full table scan
on both tables to get acceptable performance.
There are basically two options how to do the DELETE
Updatable Join View
Note that in this case your small table must have an unique index on ID, VERSION
(or a primary key)
create unique index delta_idx on delta(id,version);
Contrary the BIG table should not have such constraint. This is important, because it clearly indicates that you BIG table is the only one key preserving table in the join view.
Simple put a join to the small table can't duplicate rows from the big table due to the unique contraint
See here more information about Updating a Join Views
delete from
(
select delta.id, delta.version, big.id big_id, big.version
from big
join delta
on delta.id = big.id and delta.version = big.version
)
The delete
above removes rows from the BIG
table because this is the only key preserving table (see the discussion above)
This DML leads to a HASH JOIN
Delete with EXISTS
If your small table has no primary key (i.e. it can contain duplicated rows with same ID and VERSION
) you must fallback to the solution proposed in other answer.
DELETE FROM big
WHERE EXISTS (SELECT null
FROM delta
WHERE delta.id = big.id and delta.version = big.version
)
No indexes are required and you should expect an execution plan with HASH JOIN RIGHT SEMI
, which means that both approaches are not realy different.
Sample Data for Test
create table big as
select
trunc(rownum/10) id, mod(rownum,10) version,
lpad('x',10,'Y') pad
from dual connect by level <= 1000000;
/* the DELTA table has 50 times less rows,
allow some rows out of range of the BIG table - those rows will not be deleted **/
drop table delta;
create table delta as
select
trunc(rownum*50/10) id, mod(rownum*50,10) version
from dual connect by level <= 1001000/50;
create unique index delta_idx on delta(id,version);
回答2:
A simple approach just uses IN
or EXISTS
:
DELETE FROM bigtable bt
WHERE EXISTS (SELECT 1
FROM littletable lt
WHERE bt.? = lt.?
);
You want an index on littletable
for the keys used for the correlation clause.
来源:https://stackoverflow.com/questions/59911501/deleting-records-from-one-table-joined-onto-another-table-sql