how to make selecting random rows in oracle faster with table with millions of rows

前端 未结 7 1140
后悔当初
后悔当初 2021-02-13 11:25

Is there a way to make selecting random rows faster in oracle with a table that has million of rows. I tried to use sample(x) and dbms_random.value and its taking a long time to

相关标签:
7条回答
  • 2021-02-13 11:59

    Start with Adam's answer first, but if SAMPLE just isn't fast enough, even with the ROWNUM optimization, you can use block samples:

    ....FROM [table] SAMPLE BLOCK (0.01)
    

    This applies the sampling at the block level instead of for each row. This does mean that it can skip large swathes of data from the table so the sample percent will be very rough. It's not unusual for a SAMPLE BLOCK with a low percentage to return zero rows.

    0 讨论(0)
  • 2021-02-13 12:04

    Here's the same question on AskTom:

    http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522

    If you know how big your table is, use sample block as described above. If you don't, you can modify the routine below to get however many rows you want.

    Copied from: http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

    create or replace function get_random_rowid
    ( table_name varchar2
    ) return urowid
    as
    sql_v varchar2(100);
    urowid_t dbms_sql.urowid_table;
    cursor_v integer;
    status_v integer;
    rows_v integer;
    begin
      for exp_v in -6..2 loop
        exit when (urowid_t.count > 0);
        if (exp_v < 2) then
          sql_v := 'select rowid from ' || table_name
          || ' sample block (' || power(10, exp_v) || ')';
        else
          sql_v := 'select rowid from ' || table_name;
        end if;
        cursor_v := dbms_sql.open_cursor;
        dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
        dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
        status_v := dbms_sql.execute(cursor_v);
        loop
          rows_v := dbms_sql.fetch_rows(cursor_v);
          dbms_sql.column_value(cursor_v, 1, urowid_t);
          exit when rows_v != 100;
        end loop;
        dbms_sql.close_cursor(cursor_v);
      end loop;
      if (urowid_t.count > 0) then
        return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
      end if;
      return null;
    exception when others then
      if (dbms_sql.is_open(cursor_v)) then
        dbms_sql.close_cursor(cursor_v);
      end if;
      raise;
    end;
    /
    show errors
    
    0 讨论(0)
  • 2021-02-13 12:04

    Version with retries when no rows returned:

    WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                         MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                    FROM (SELECT o.Data_Object_Id,
                                 e.Relative_Fno,
                                 e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                            FROM Dba_Extents e
                            JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                           WHERE e.Segment_Name = 'MY_TABLE'
                             AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                                (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                        MAX(e.Owner)        AS Owner,
                                        MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                   FROM Dba_Extents e
                                  WHERE e.Segment_Name = 'MY_TABLE'
                                    AND e.Owner = 'MY_USER'
                                    AND e.Segment_Type = 'TABLE')) e
                    JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                      AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
      Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                                 FROM Dual
                                 LEFT JOIN gen ON 1=1
                                UNION ALL
                               SELECT Cnt + 1, gen.Row_Id
                                 FROM Retries
                                 LEFT JOIN gen ON 1=1
                                WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
    SELECT *
      FROM MY_USER.MY_TABLE
     WHERE ROWID = (SELECT Row_Id
                      FROM Retries
                     WHERE Row_Id IS NOT NULL)
    
    0 讨论(0)
  • 2021-02-13 12:04

    Can you use pseudorandom rows?

    select * from (
      select * from ... where... order by ora_hash(rowid)
    ) where rownum<100
    
    0 讨论(0)
  • 2021-02-13 12:14

    Someone told sample(x) is the fastest way you can. But for me this method works slightly faster than sample(x) method. It should take fraction of the second (0.2 in my case) no matter what is the size of the table. If it takes longer try to use hints (--+ leading(e) use_nl(e t) rowid(t)) can help

    SELECT *
      FROM My_User.My_Table
     WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                      FROM (SELECT o.Data_Object_Id,
                                   e.Relative_Fno,
                                   e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                              FROM Dba_Extents e
                              JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                             WHERE e.Segment_Name = 'MY_TABLE'
                               AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                                  (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                          MAX(e.Owner)        AS Owner,
                                          MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                     FROM Dba_Extents e
                                    WHERE e.Segment_Name = 'MY_TABLE'
                                      AND e.Owner = 'MY_USER'
                                      AND e.Segment_Type = 'TABLE')) e
                      JOIN My_User.My_Table t
                        ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                       AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
    
    0 讨论(0)
  • 2021-02-13 12:21

    Using appropriate values of sample(x) is the fastest way you can. It's block-random and row-random within blocks, so if you only want one random row:

    select dbms_rowid.rowid_relative_fno(rowid) as fileno,
           dbms_rowid.rowid_block_number(rowid) as blockno,
           dbms_rowid.rowid_row_number(rowid) as offset
      from (select rowid from [my_big_table] sample (.01))
     where rownum = 1
    

    I'm using a subpartitioned table, and I'm getting pretty good randomness even grabbing multiple rows:

    select dbms_rowid.rowid_relative_fno(rowid) as fileno,
           dbms_rowid.rowid_block_number(rowid) as blockno,
           dbms_rowid.rowid_row_number(rowid) as offset
      from (select rowid from [my_big_table] sample (.01))
     where rownum <= 5
    
        FILENO    BLOCKNO     OFFSET
    ---------- ---------- ----------
           152    2454936         11
           152    2463140         32
           152    2335208          2
           152    2429207         23
           152    2746125         28
    

    I suspect you should probably tune your SAMPLE clause to use an appropriate sample size for what you're fetching.

    0 讨论(0)
提交回复
热议问题