SQL Server heap v.s. clustered index

后端 未结 3 1819
难免孤独
难免孤独 2021-01-31 04:06

I am using SQL Server 2008. I know if a table has no clustered index, then it is called heap, or else the storage model is called clustered index (B-Tree).

I want to lea

相关标签:
3条回答
  • 2021-01-31 04:15

    Books Online is the best source!

    The whole Database Engine - Planning and Architecture - Tables and Index Data Structures Architecture is very good internal introduction.

    From this link you can download a local copy of Books Online(it is free). It is the best (and official) reference to all Sql 2008 questions.

    0 讨论(0)
  • 2021-01-31 04:32

    Heap storage has nothing to do with these heaps.

    Heap just means records themselves are not ordered (i. e. not linked to one another).

    When you insert a record, it just gets inserted into the free space the database finds.

    Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

    If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

    Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

    Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

    If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

    This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

    It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

    By creating an index on a clustered table, you in fact always get a composite index

    CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)
    
    CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)
    

    Index IX_mytable_5678 is in fact an index on the following columns:

    col5
    col6
    col7
    col8
    col1
    col2
    col3
    col4
    

    This has one more side effect:

    A DESC condition in a single-column index on a clustered table makes sense in SQL Server

    This index:

    CREATE INDEX IX_mytable ON mytable (col1)
    

    can be used in a query like this:

    SELECT  TOP 100 *
    FROM    mytable
    ORDER BY
           col1, id
    

    , while this one:

    CREATE INDEX IX_mytable ON mytable (col1 DESC)
    

    can be used in a query like this:

    SELECT  TOP 100 *
    FROM    mytable
    ORDER BY
           col1, id DESC
    
    0 讨论(0)
  • 2021-01-31 04:35

    Heaps are just tables without a clustering key - without a key that enforces a certain physical order.

    I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.

    In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.

    See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.

    My recommendation would be:

    • in 99% of all cases try to use a INT IDENTITY as your primary key and let SQL Server make that the clustering key as well
    • exception #1: if you're bulk loading huge data amounts, you might be fine without a primary / clustering key for your temporary table
    • exception #2: if you must use a GUID as your primary key, then set your clustering key to a different column - preferably a INT IDENTITY - and I would even create a separate INT column just for that purpose, if no other column can be used

    Marc

    0 讨论(0)
提交回复
热议问题