问题
I am able to get join elimination to work for simple cases such as one-to-one relations, but not for slightly more complicated scenarios. Ultimately I want to try anchor modelling, but first I need to find a way around this problem. I'm using Oracle 12c Enterprise Edition Release 12.1.0.2.0.
DDL for my test case:
drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product cascade constraints;
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key(product_id, from_date)
);
Some example data:
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);
insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');
insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);
insert into product_price values(2, date '2016-02-01', 5);
insert into product_price values(4, date '2016-01-01', 10);
commit;
The 5NF view
This first view does not compile - it fails with ORA-01799: a column may not be outer-joined to a subquery. Unfortunately, this is how most of the historized views are defined when I'm looking at the online examples of anchor modelling...
create view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc on(
pc.product_id = p.product_id
)
left join product_price pp on(
pp.product_id = p.product_id
and pp.from_date = (select max(pp2.from_date)
from product_price pp2
where pp2.product_id = pp.product_id)
);
Below is my attempt at fixing it. When using this view with a simple select of product_id
, Oracle manages to eliminate product_color but not product_price.
create view product_5nf as
select product_id
,pc.color
,pp.price
from product p
left join product_color pc using(product_id)
left join (select pp1.product_id, pp1.price
from product_price pp1
where pp1.from_date = (select max(pp2.from_date)
from product_price pp2
where pp2.product_id = pp1.product_id)
)pp using(product_id);
select product_id
from product_5nf;
----------------------------------------------------------
| Id | Operation | Name | Rows |
----------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
|* 1 | HASH JOIN OUTER | | 4 |
| 2 | INDEX FAST FULL SCAN| PRODUCT_PK | 4 |
| 3 | VIEW | | 3 |
| 4 | NESTED LOOPS | | 3 |
| 5 | VIEW | VW_SQ_1 | 5 |
| 6 | HASH GROUP BY | | 5 |
| 7 | INDEX FULL SCAN | PRODUCT_PRICE_PK | 5 |
|* 8 | INDEX UNIQUE SCAN | PRODUCT_PRICE_PK | 1 |
----------------------------------------------------------
The only solution I have found is to use scalar sub queries instead, like this:
create or replace view product_5nf as
select p.product_id
,pc.color
,(select pp.price
from product_price pp
where pp.product_id = p.product_id
and pp.from_date = (select max(from_date)
from product_price pp2
where pp2.product_id = pp.product_id)) as price
from product p
left join product_color pc on(
pc.product_id = p.product_id
)
select product_id
from product_5nf;
---------------------------------------------------
| Id | Operation | Name | Rows |
---------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
| 1 | INDEX FAST FULL SCAN| PRODUCT_PK | 4 |
---------------------------------------------------
Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.
TL;DR
How can I rewrite the view product_5nf
so that Oracle sucessfully eliminates both of the dependent tables?
回答1:
I think you have two problems going on here.
First, join elimination only works for certain, specific situations (PK-PK, PK-FK, etc). It is not a general thing where you can LEFT JOIN
to any row set that will return a single row for each join key value and have Oracle eliminate the join.
Second, even if Oracle were advanced enough to do join elimination on ANY LEFT JOIN
where it knew it would get only one row per join key value, Oracle does not yet support join eliminations on LEFT JOINS
that are based on a composite key (Oracle support document 887553.1 says this is coming in R12.2).
One workaround you could consider is materializing a view with the last row for each product_id
. Then LEFT JOIN
to the materialized view. Like this:
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key (product_id, from_date )
);
-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) virtual not null );
-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price ) including new values;
-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM product_price
GROUP BY product_id;
-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );
-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);
insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');
insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);
insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);
insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);
commit;
-- Create the 5NF view using the materialized view
create or replace view product_5nf as
select p.product_id
,pc.color
,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
,to_number(substr(ppl.sortable_row,25)) price
from product p
left join product_color pc on pc.product_id = p.product_id
left join product_price_latest ppl on ppl.product_id = p.product_id
;
-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;
-- Check the plan
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
------------------------------------------------
| Id | Operation | Name | E-Rows |
------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | INDEX FULL SCAN | PRODUCT_PK | 1 |
------------------------------------------------
回答2:
I can't make the price join be eliminated, but if you do the following, it can at least reduce the access to a single index for the price check:
CREATE OR REPLACE view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc ON p.product_id = pc.product_id
left join (select pp1.product_id, pp1.price
from (SELECT product_id,
price,
from_date,
max(from_date) OVER (PARTITION BY product_id) max_from_date
FROM product_price) pp1
where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;
回答3:
Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.
The cost-based optimizer in Oracle 12.1 can perform a query transformation to unnest scalar subqueries. So, the performance could be just as good as the LEFT JOIN
you are after in your question.
The trick is you have to jigger it a bit.
First, make sure the scalar subquery returns max()
with no group by
, so CBO knows there is no chance of getting more than one row. (It won't unnest otherwise).
Second, you need to combine all the fields from product_price
into a single scalar subquery, or else the CBO will unnest and join in product_price
multiple times.
Here is a test case for Oracle 12.1 that illustrates this working.
drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product cascade constraints;
create table product(
product_id number not null
,constraint product_pk primary key(product_id)
);
create table product_color(
product_id number not null references product
,color varchar2(10) not null
,constraint product_color_pk primary key(product_id)
);
create table product_price(
product_id number not null references product
,from_date date not null
,price number not null
,constraint product_price_pk primary key (product_id, from_date )
);
insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;
insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;
--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));
commit;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end;
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end;
commit;
alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );
create or replace view product_5nf as
select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price
from
( select p.product_id
,pc.color
,( SELECT max(composite_column) FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
from product p
left join product_color pc on pc.product_id = p.product_id ) d
;
select product_id from product_5nf;
----------------------------------------------
| Id | Operation | Name | E-Rows |
----------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | TABLE ACCESS FULL| PRODUCT | 100K|
----------------------------------------------
select * from product_5nf;
SELECT *
FROM TABLE (DBMS_XPLAN.display_cursor (null, null,
'ALLSTATS LAST'));
--------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows | OMem | 1Mem | Used-Mem |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
|* 1 | HASH JOIN RIGHT OUTER | | 100K| 8387K| 3159K| 8835K (0)|
| 2 | VIEW | VW_SSQ_2 | 2 | | | |
| 3 | HASH GROUP BY | | 2 | 13M| 2332K| 12M (0)|
| 4 | VIEW | VM_NWVW_3 | 2 | | | |
|* 5 | FILTER | | | | | |
| 6 | HASH GROUP BY | | 2 | 23M| 5055K| 20M (0)|
|* 7 | HASH JOIN | | 480K| 12M| 4262K| 17M (0)|
| 8 | TABLE ACCESS FULL| PRODUCT_PRICE | 220K| | | |
| 9 | TABLE ACCESS FULL| PRODUCT_PRICE | 220K| | | |
|* 10 | HASH JOIN OUTER | | 100K| 5918K| 3056K| 5847K (0)|
| 11 | TABLE ACCESS FULL | PRODUCT | 100K| | | |
| 12 | TABLE ACCESS FULL | PRODUCT_COLOR | 100K| | | |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("ITEM_2"="P"."PRODUCT_ID")
5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")
回答4:
Ok, I'm answering my own question. The information in this answer is valid for Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production, but likely not for later versions. Don't vote for this answer, as it does not answer the question.
Due to a specific limitation in the current version (as noted by Mathew McPeak) it is simply not possible to get Oracle to completely eliminate the unnecessary joins in the underlaying 5NF view. The limitation is that join elimination is not possible on left joins that are based on a composite key.
Any attempt at working around this limitation seems to introduce either duplication or update anomalies. The accepted answer demonstrates how to overcome this limitation in the optimiser by using materialized views and thereby duplicating the data. This answer shows how to solve the problem with less duplication but with update anomalies.
This workaround is based on the fact that you can use nullable columns in unique indexes. We will put null
for all historical versions and the actual product_id
for the most recent version referencing the product table with a foreign key.
alter table product_price add(
latest_id number
,constraint product_price_uk unique(latest_id)
,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
,constraint product_price_chk check(latest_id = product_id)
);
-- One-time update of existing data
update product_price a
set a.latest_id = a.product_id
where from_date = (select max(from_date)
from product_price b
where a.product_id = b.product_id);
PRODUCT_ID FROM_DATE PRICE LATEST_ID
---------- ---------- ---------- ----------
1 2016-01-01 10 null
1 2016-02-01 8 null
1 2016-05-01 5 1
2 2016-02-01 5 2
4 2016-01-01 10 4
-- New view definition
create or replace view product_5nf as
select p.product_id
,pc.color
,pp.price
from product p
left join product_color pc on(pc.product_id = p.product_id)
left join product_price pp on(pp.latest_id = p.product_id);
Of course, now latest_id
has to be manually maintained... whenever a new record is inserted, the old record must first be updated with null.
There are two benefits with this approach. First, Oracle is able to completely remove unnecessary joins. And second, the joins are not executed as scalar subqueries.
SQL> select count(*) from product_5nf;
---------------------------------------
| Id | Operation | Name |
---------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------
Oracle recognizes that the count can be resolved without even touching the base table. And no unnecessary joins to be seen...
SQL> select product_id, price from product_5nf;
---------------------------------------------------------
| Id | Operation | Name |
---------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | INDEX FULL SCAN | PRODUCT_PK |
| 3 | TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE |
|* 4 | INDEX FULL SCAN | PRODUCT_PRICE_UK |
---------------------------------------------------------
Oracle recognizes that we have to join to product_price
in order to get the price column. And product_color
is nowhere to be seen...
SQL> select * from product_5nf;
----------------------------------------------------------
| Id | Operation | Name |
----------------------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN OUTER | |
| 2 | NESTED LOOPS OUTER | |
| 3 | INDEX FULL SCAN | PRODUCT_PK |
| 4 | TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR |
|* 5 | INDEX UNIQUE SCAN | PRODUCT_COLOR_PK |
| 6 | TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE |
|* 7 | INDEX FULL SCAN | PRODUCT_PRICE_UK |
----------------------------------------------------------
Here Oracle has to materialize all the joins, since all columns are referenced.
回答5:
[I don't know if an ANTI-JOIN counts as a subquery in Oracle], but the not exists
trick is often a way to avoid an aggregating subquery:
CREATE VIEW product_5nfa as
SELECT p.product_id
,pc.color
,pp.price
FROM product p
LEFT JOIN product_color pc
ON pc.product_id = p.product_id
LEFT join product_price pp
ON pp.product_id = p.product_id
AND NOT EXISTS ( SELECT * FROM product_price pp2
WHERE pp2.product_id = pp.product_id
AND pp2.from_date > pp.from_date
)
;
Comment from OP: The view is created, but Oracle is still unable to remove the join. Here is the execution plan.
select count(*) from product_5nfa;
-------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | NESTED LOOPS OUTER | |
| 3 | INDEX FULL SCAN | PRODUCT_PK |
| 4 | VIEW | |
| 5 | NESTED LOOPS ANTI| |
|* 6 | INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|* 7 | INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------
来源:https://stackoverflow.com/questions/40490220/join-elimination-not-working-in-oracle-with-sub-queries