Oracle 11G - Performance effect of indexing at insert

后端 未结 3 951
悲哀的现实
悲哀的现实 2021-01-06 01:50

Objective

Verify if it is true that insert records without PK/index plus create thme later is faster than insert with PK/Index.

Note

相关标签:
3条回答
  • 2021-01-06 02:21

    It's true that it is faster to modify a table if you do not also have to modify one or more indexes and possibly perform constraint checking as well, but it is also largely irrelevant if you then have to add those indexes. You have to consider the complete change to the system that you wish to effect, not just a single part of it.

    Obviously if you are adding a single row into a table that already contains millions of rows then it would be foolish to drop and rebuild indexes.

    However, even if you have a completely empty table into which you are going to add several million rows it can still be slower to defer the indexing until afterwards.

    The reason for this is that such an insert is best performed with the direct path mechanism, and when you use direct path inserts into a table with indexes on it, temporary segments are built that contain the data required to build the indexes (data plus rowids). If those temporary segments are much smaller than the table you have just loaded then they will also be faster to scan and to build the indexes from.

    the alternative, if you have five index on the table, is to incur five full table scans after you have loaded it in order to build the indexes.

    Obviously there are huge grey areas involved here, but well done for:

    1. Questioning authority and general rules of thumb, and
    2. Running actual tests to determine the facts in your own case.

    Edit:

    Further considerations -- you run a backup while the indexes are dropped. Now, following an emergency restore, you have to have a script that verifies that all indexes are in place, when you have the business breathing down your neck to get the system back up.

    Also, if you absolutely were determined to not maintain indexes during a bulk load, do not drop the indexes -- disable them instead. This preserves the metadata for the indexes existence and definition, and allows a more simple rebuild process. Just be careful that you do not accidentally re-enable indexes by truncating the table, as this will render disabled indexes enabled again.

    0 讨论(0)
  • 2021-01-06 02:22

    Oracle has to do more work while inserting data into table having an index. In general, inserting without index is faster than inserting with index.

    Think in this way,

    • Inserting rows in a regular heap-organized table with no particular row order is simple. Find a table block with enough free space, put the rows randomly.

    • But, when there are indexes on the table, there is much more work to do. Adding new entry for the index is not that simple. It has to traverse the index blocks to find the specific leaf node as the new entry cannot be made into any block. Once the correct leaf node is found, it checks for enough free space and then makes the new entry. If there is not enough space, then it has to split the node and distribute the new entry into old and new node. So, all this work is an overhead and consumes more time overall.

    Let's see a small example,

    Database version :

    SQL> SELECT banner FROM v$version where ROWNUM =1;
    
    BANNER
    --------------------------------------------------------------------------------
    Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
    

    OS : Windows 7, 8GB RAM

    With Index

    SQL> CREATE TABLE t(A NUMBER, CONSTRAINT PK_a PRIMARY KEY (A));
    
    Table created.
    
    SQL> SET timing ON
    SQL> INSERT INTO t SELECT LEVEL FROM dual CONNECT BY LEVEL <=1000000;
    
    1000000 rows created.
    
    Elapsed: 00:00:02.26
    

    So, it took 00:00:02.26. Index details:

    SQL> column index_name format a10
    SQL> column table_name format a10
    SQL> column uniqueness format a10
    SQL> SELECT index_name, table_name, uniqueness FROM user_indexes WHERE table_name = 'T';
    
    INDEX_NAME TABLE_NAME UNIQUENESS
    ---------- ---------- ----------
    PK_A       T          UNIQUE
    

    Without Index

    SQL> DROP TABLE t PURGE;
    
    Table dropped.
    
    SQL> CREATE TABLE t(A NUMBER);
    
    Table created.
    
    SQL> SET timing ON
    SQL> INSERT INTO t SELECT LEVEL FROM dual CONNECT BY LEVEL <=1000000;
    
    1000000 rows created.
    
    Elapsed: 00:00:00.60
    

    So, it took only 00:00:00.60 which is faster compared to 00:00:02.26.

    0 讨论(0)
  • 2021-01-06 02:28

    The current test case is probably good enough for you to overrule the "best practices". There are too many variables involved to make a blanket statement that "it's always best to leave the indexes enabled". But you're probably close enough to say it's true for your environment.

    Below are some considerations for the test case. I've made this a community wiki in the hopes that others will add to the list.

    1. Direct-path inserts. Direct-path writes use different mechanisms and may work completely differently. Direct-path inserts can often be significantly faster than regular inserts, although they have some complicated restrictions (for example, triggers must be disabled) and disadvantages (the data is not immediately backed-up). One particular way it affects this scenario is that NOLOGGING for indexes only applies during index creation. So even if a direct-path insert is used, an enabled index will always generate REDO and UNDO.
    2. Parallelism. Large insert statements often benefit from parallel DML. Usually it's not worth worrying about the performance of bulk loads until it takes more than several seconds, which is when parallelism starts to be useful.
    3. Bitmap indexes are not meant for large DML. Inserts or updates to a table with a bitmap index can lock the whole table and lead to disastrous performance. It might be helpful to limit the test case to b-tree indexes.
    4. Add alter system switch logfile;? Log file switches can sometimes cause performance issues. The tests would be somewhat more consistent if they all started with empty logfiles.
    5. Move data generation logic into a separate step. Hierarchical queries are useful for generating data but they can have their own performance issues. It might be better to create in intermediate table to hold the results, and then only test inserting the intermediate table into the final table.
    0 讨论(0)
提交回复
热议问题