Optimizing queries based on clustered and non-clustered indexes in SQL?

こ雲淡風輕ζ 提交于 2019-12-04 14:37:40

For SQL Server

Q1 Extra space is only needed for the clustered index if it is not unique. SQL Server will add a 4 byte uniquifier internally to a non-unique clustered index. This is because it uses the cluster key as a rowid in non-clustered indexes.

Q2 A non-clustered index can be read in order. That may aid queries where you specify an order. It may also make merge joins attractive. It will also help with range queries (x < col and y > col).

Q3 SQL Server does an extra "bookmark lookup" when using a non-clustered index. But, this is only if it needs a column that isn't in the index. Note also, that you can include extra columns in the leaf level of indexs. If an index can be used without the additional lookup it is called a covering index.

If a bookmark lookup is required, it doesn't take a high percentage of rows until it's quicker just to scan the whole clustered index. The level depends on row size, key size etc. But 5% of rows is a typical cut off.

Q4 If the most important thing in your application was making both these queries as fast as possible, you could create covering index on both of them:

create index IX_1 on employee (age) include (name, salary);
create index IX_2 on employee (salary) include (name, age);

Note you don't have to specifically include the cluster key, as the non-clustered index has it as the row pointer.

Q5 This is more important for cluster keys than non-cluster keys due to the uniquifier. The real issue though is whether an index is selective or not for your queries. Imagine an index on a bit value. Unless the distribution of data is very skewed, such an index is unlikely to be used for anything.


More info about the uniquifier. Imagine you and a non unique clustered index on age, and a non-clustered index on salary. Say you had the following rows:

age | salary | uniqifier
20  | 1000   | 1
20  | 2000   | 2

Then the salary index would locate rows like so

1000 -> 20, 1
2000 -> 20, 2

Say you ran the query select * from employee where salary = 1000, and the optimizer chose to use the salary index. It would then find the pair (20, 1) from the index lookup, then lookup this value in the main data.

I don't know about internals of Microsoft SQL Server, but I can answer for MySQL, which you tagged for your question. The details could vary for other implementations.

Q1. Right, no extra space is needed for the clustered index.

What happens if you drop the clustered index? MySQL's InnoDB engine always uses the primary key (or the first non-null unique key) as the clustered index. If you define a table without a primary key, or you drop the primary key of an existing table, InnoDB generates an internal artificial key for the clustered index. This internal key has no logical column to reference it.

Q2. A order of rows returned by a query that uses a non-clustered index is not guaranteed. In practice, it's the order in which the rows were accessed. If you need rows to be returned in a specific order, you should use ORDER BY in your query. If the optimizer can infer that your desired order is the same as the order in which it will access rows (index order, whether by clustered or non-clustered index), then it can skip the sorting step.

Q3. InnoDB non-clustered index does not have a pointer to the corresponding row at a leaf of the index, it has the value of the primary key. So a lookup in a non-clustered index is really two B-tree searches, the first to find the leaf of the non-clustered index, and then a second search in the clustered index.

This is double the cost of a single B-tree search (more or less), so InnoDB has an extra feature called the Adaptive Hash Index. Frequently-searched values get cached in the AHI, and the next time a query searches for a cached value, it can do an O(1) lookup. In the AHI cache, it finds a pointer directly to the leaf of the clustered index, so it eliminates both B-tree searches, part of the time.

How much this improves total performance depends on how frequently you search for the same value(s) that have been searched before. In my experience, it's typical for the ratio of hash searches vs. non-hash searches to be about 1:2.

Q4. Construct the indexes to serve the queries you need to be optimized. Typically a clustered index is a primary or unique key, and at least in the case of InnoDB, this is required. Neither age nor salary is likely to be unique.

You may like my presentation, How to Design Indexes, Really.

Q5. InnoDB automatically creates an index when you declare a unique constraint. You can't have the constraint without an index existing for it. If you didn't have an index, how would the engine ensure uniqueness when you insert a value? It would need to search the entire table for a duplicate value in that column. The index helps to make unique checks much more efficient.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!