What do I miss in understanding the clustered index?

后端 未结 2 500
抹茶落季
抹茶落季 2021-01-28 12:21

In absence of any index the table rows are accessed through IAM ((Index Allocation Map).
Can I directly access a row programmatically using IAM?

Does absen

2条回答
  •  闹比i
    闹比i (楼主)
    2021-01-28 13:07

    Please read my answer under "No direct access to data row in clustered table - why?", first.

    If there is an IAM, then there is an Index.

    But if the is no CI, then the rows are in a Heap, and yes, if you want to read it directly (without using an NCI, or where no Indices exist), you can only table scan a Heap.

    A Clustered Index is always better that not having one. There is one exception, and one caveat, both for abnormal or sub-standard conditions:

    1. Non-Unique CI Key. This causes Overflow Pages. Relational tables are required to have unique keys, so this is not a Relational table. The CI can be made unique quite easily by overloading the columns. A Non-unique CI is still better (as per my other post) to have a Non-unique CI than no CI.

    2. Monotonic Key. Typically an IDENTITY column. Instead of random Inserts which insert rows distributed throughout the data storage structure (as is normal with a "good" natural Relational key), the inserted Key is always on the last page. This causes an Insert hotspot, and reduces concurrency. Relational keys are supposed to naturally unique; the surrogate is always an additional index. A surrogate-only is simply not a relational table (it is group of Unnormalised spreadsheets with row identifiers linking them together; you will not get th epower of a databse from that). .
      So the standing advice is, use an NCI for monotonic keys, and ensure that the CI allows good data distribution.

    The advantages of CIs are vast, there is no good reason to have one (there may be bad reasons as alluded to above).

    CIs allow range queries; NCIs do not. But that is not the only reason.

    Another caveat is you need to keep the width of the CI Key small, because it is carried in the NCIs. Now normally this is not a problem, as in, wide CI keys are fine. But where you have an Unormalise dbunch of spreadsheets masquerading as a database, which results in many more indices than a Normalised database, that does become a consideration. Therefore the standing advice for Empire devotees is, keep the width of the CI key down. CIs do not "increase" the NCIs, that is not stated accurately. If you have NCIs, well, it is going to have a pointer or a CI key; if you have a CI (with all the benefits) then the cost is, a CI key instead of a RowId, negligible. So the accurate statement is, fat wide CI keys increase the NCIs.

    Whoever says sequential access of CIs cannot be parallelised is wrong (MS may break it in one version and fix it in the next, but that is transient).

    Using the ANSI SQL ...PRIMARY KEY ... notation defaults to UNIQUE CLUSTERED. because the db is supposed to be Relational. And the Unique PK is supposed to be a nice friendly Relational key, not a idiotic IDENTITY column. Therefore invariably (not counting exceptions) the PRIMARY KEY is the best candidate for clustering.

    You can always create whatever indices you want by avoiding the ANSI SQL ...PRIMARY KEY ... notation and using CREATE [UNIQUE] [CLUSTERED] INDEX notation instead.

    It is not possible to answer that last question of yours, you will have to keep asking questions until you run out. But please, forget the doco you reference and start again, otherwise we will be here for days discussing the difference between clear knowledge and gobbledegook.

提交回复
热议问题