Index Key Column VS Index Included Column

前端 未结 5 704
迷失自我
迷失自我 2021-01-30 12:42

Can someone explain this two - Index Key Column VS Index Included Column?

Currently, I have an index that has 4 Index Key Column and 0 Included Column.

Thanks

5条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-30 13:04

    I would like to add to the other answers more detailed information about index key columns and included columns, the benefits of included columns usage. For this answer, I have taken information from a post "A Close Look at the Index Include Clause" by Markus Winand published 2019-04-30 here:https://use-the-index-luke.com/blog/2019-04/include-columns-in-btree-indexes

    A brief summary of how index key columns differ from included columns

    To understand the include clause, you must first understand that using an index affects up to three layers of data structures:

    • The B-tree
    • The doubly linked list at the leaf node level of the B-tree
    • The table

    The first two structures together form an index so they could be combined into a single item, i.e. the “B-tree index”. In the general case, the database software starts traversing the B-tree to find the first matching entry at the leaf node level (1). It then follows the doubly linked list until it has found all matching entries (2) and finally it fetches each of those matching entries from the table (3).

    When loading a few rows, the B-tree makes the greatest contribution to the overall effort. As soon as you need to fetch just a handful of rows from the table, this step takes the lead. In either case—few or many rows—the doubly linked list is usually a minor factor because it stores rows with similar values next to each other so that a single read operation can fetch 100 or even more rows. The most generic idea about optimization > is to do less work to achieve the same goal. When it comes to index access, this means that the database software omits accessing a data structure if it doesn’t need any data from it. The index-only scan does exactly that: it omits the table access if the required data is available in the doubly linked list of the index.

    It is a common misconception that indexes only help the where clause. B-tree indexes can also help the order by, group by, select and other clauses. It is just the B-tree part of an index—not the doubly linked list—that cannot be used by other clauses.

    The include clause allows us to make a distinction between columns we would like to have in the entire index (key columns) and columns we only need in the leaf nodes (include columns). That means it allows us to remove columns from the non-leaf nodes if we don’t need them there.

    How included columns affect multiple aspects of query execution and the benefits of their usage

    The order of the leaf node entries does not take the included columns into account. The index is solely ordered by its key columns. This has two consequences: included columns cannot be used to prevent sorting nor are they considered for uniqueness.

    The term “covering index” is sometimes used in the context of index-only scans or include clauses. What matters is whether a given index can support a given query by means of an index-only scan. Whether or not that index has an include clause or contains all table columns is not relevant.

    The new definition with the include clause has some advantages:

    The tree might have fewer levels (<~40%)

    As the tree nodes above the doubly linked list do not contain the include columns, the database can store more branches in each block so that the tree might have fewer levels.

    The index is slightly smaller (<~3%)

    As the non-leaf nodes of the tree don’t contain include columns, the overall size of that index is slightly less. However, the leaf node level of the index needs the most space anyway so that the potential savings in the remaining nodes is very little.

    It documents its purpose

    This is definitely the most underestimated benefit of the include clause: the reason why the column is in the index is document in the index definition itself. When extending an existing index, it is very important to know exactly why the index is currently defined the way it happens to be defined. The freedoms you have in changing the index without breaking any other queries is a direct result of this knowledge.

       CREATE INDEX idx
        ON sales ( subsidiary_id )
         INCLUDE ( eur_value )
    

    As the eur_value column is in the include clause, it is not in the non-leaf nodes and thus neither useful for navigating the tree nor for ordering. Adding a new column to the end of the key part is relatively safe.

       CREATE INDEX idx
        ON sales ( subsidiary_id, ts )
         INCLUDE ( eur_value )
    

    Even though there is still a small risk of negative impacts for other queries, it is usually worth taking that risk.

    Filtering on included columns

    Until now we have focused on how the include clause can enable index-only scans. Let’s also look at another case where it is beneficial to have an extra column in the index.

       SELECT * FROM sales
        WHERE subsidiary_id = ?
         AND notes LIKE '%search term%'
    

    I’ve made the search term a literal value to show the leading and trailing wildcards—of course you would use a bind parameter in your code. Now, let’s think about the right index for this query. Obviously, the subsidiary_id needs to be in the first position. If we take the previous index from above, it already satisfies this requirement:

       CREATE INDEX idx
        ON sales ( subsidiary_id, ts )
         INCLUDE ( eur_value )
    

    The database software can use that index with the three-step procedure as described at the beginning: (1) it will use the B-tree to find the first index entry for the given subsidiary; (2) it will follow the doubly linked list to find all sales for that subsidiary; (3) it will fetch all related sales from the table, remove those for which the like pattern on the notes column doesn’t match and return the remaining rows.

    The problem is the last step of this procedure: the table access loads rows without knowing if they will make it into the final result. Quite often, the table access is the biggest contributor to the total effort of running a query. Loading data that is not even selected is a huge performance no-no.

    The challenge with this particular query is that it uses an in-fix like pattern. Normal B-tree indexes don’t support searching such patterns. However, B-tree indexes still support filtering on such patterns. Note the emphasis: searching vs. filtering.

    In other words, if the notes column was present in the doubly linked list, the database software could apply the like pattern before fetching that row from the table (not PostgreSQL, see below). This prevents the table access if the like pattern doesn’t match. If the table has more columns, there is still a table access to fetch those columns for the rows that satisfy the where clause—due to the select *.

       CREATE INDEX idx
        ON sales ( subsidiary_id, ts )
         INCLUDE ( eur_value, notes )
    

    If there are more columns in the table, the index does not enable an index-only scan. Nonetheless, it can bring the performance close to that of an index-only scan if the portion of rows that match the like pattern is very low. In the opposite case—if all rows match the pattern—the performance is a little bit worse due to the increased index size. However, the breakeven is easy to reach: for overall performance improvement, it is often enough that the like filter removes a small percentage of the rows. Your mileage will vary depending on the size of the involved columns.

    Unique indexes with the include clause

    An entirely different aspect of the include clause: unique indexes with an include clause only consider the key columns for the uniqueness. That allows us to create unique indexes that have additional columns in the leaf nodes, e.g. for an index-only scan.

       CREATE UNIQUE INDEX …
        ON … ( id )  
         INCLUDE ( payload )
    

    This index protects against duplicate values in the id column, yet it supports an index-only scan for the next query.

        SELECT payload   
          FROM …  
           WHERE id = ?  
    

    Note that the include clause is not strictly required for this behavior: databases that make a proper distinction between unique constraints and unique indexes just need an index with the unique key columns as the leftmost columns—additional columns are fine.

提交回复
热议问题