Performance: rank() vs sub-query. Sub query have lower cost?

问题

Inspired by this question I decided to test the rank() function, trying to see if sub query's are less efficient than rank. So I created a table:

create table teste_rank ( codigo number(7), data_mov date, valor number(14,2) );
alter table teste_rank add constraint tst_rnk_pk primary key ( codigo, data_mov );

and inserted some records...

declare
  vdata date;
begin
  dbms_random.initialize(120401);
  vdata := to_date('04011997','DDMMYYYY');
  for reg in 1 .. 465 loop
    vdata := to_date('04011997','DDMMYYYY');
    while vdata <= trunc(sysdate) loop
      insert into teste_rank 
          (codigo, data_mov, valor) 
        values 
          (reg, vdata, dbms_random.value(1,150000));
      vdata := vdata + 2;
    end loop;
    commit;
  end loop;
end;
/

And then tested two querys:

select * 
  from teste_rank r
 where r.data_mov = ( select max(data_mov) 
                        from teste_rank 
                       where data_mov <= trunc(sysdate) 
                         and codigo = 1 )
   and r.codigo = 1;

select *
  from ( select rank() over ( partition by codigo order by data_mov desc ) rn, t.*
           from teste_rank t
          where codigo = 1
            and data_mov <= trunc(sysdate) ) r
 where r.rn = 1;

As you can see, the cost of sub query is lower than rank(). Is this right? Am I missing something there?

PS: Tested also with a full query in the table and still sub query with the low cost.

EDIT

I generated a tkprof of the two query's (traced one, shutdown the database, startup and traced the second).

For subquery

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          5          0           0
Execute      1      0.00       0.00          0          3          0           0
Fetch        2      0.00       0.00          1          4          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.02          4         12          0           1

For rank()

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.02          3          3          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.00          9         19          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.01       0.03         12         22          0           1

Can I conclude that sub query not will always less efficient than rank? When is indicated rank instead of sub query?

回答1:

I'm not really sure what your question is. Yes, according to these two execution plans, in this case, the subquery method has a lower expected cost. Doesn't seem too surprising, since it can use the index to very quickly locate the exact row you're interested in. Specifically in this case, the subquery only has to do a very quick scan of the PK index. The situation might be different if the subquery involved columns that weren't part of the index.

The query using rank() has to get all the matching rows and rank them. I don't believe that the optimizer has any short-circuit logic to recognize that this is a top-n query and therefore avoid a full sort, even though all you really care about is the top-ranked row.

You might also try this form, which the optimizer should recognize as a top-n query. I would expect in your case that it would require only a single range scan on the index followed by a table access.

select * 
  from (select *
          from teste_rank r
          where data_mov <= trunc(sysdate) 
            and codigo = 1
        order by data_mov desc)
  where rownum=1;

回答2:

Cost is the cost based optimizer's estimate of what it will take to execute a query.

It's possible for the CBO to get it wrong, especially if statistics are out of date.

So, what to do? Try executing each query with 'set autotrace on'. How many buffer gets and physical reads does each query do? In other words, how much actual work does each query do?

Hope that helps.

来源：https://stackoverflow.com/questions/9232470/performance-rank-vs-sub-query-sub-query-have-lower-cost

标签

performance

Oracle

subquery

rank