compression, defragmentation, reclaiming space, shrinkdatabase vs. shrinkfile

前端 未结 1 1273
感动是毒
感动是毒 2021-02-11 05:38

[1] states:

  • \"When data is deleted from a heap, the data on the page is not compressed (reclaimed). And should all of the rows of a heap page are deleted, often
1条回答
  •  孤城傲影
    2021-02-11 06:22

    No one touched this in over one month.

    The answers to the first three are actually in the Diagram I Made for You, which you have not bothered to digest and ask questions about ... it is often used as a platform for discussion.

    (That is a condensed version of my much more elaborate Sybase diagrams, which I have butchered for the MS context. There is a link at the bottom of that doc, if you want the full Sybase set.)

    Therefore I am not going to spend much time on you either. And please do not ask for links to "reference sites", there ain't no such thing (what is available is non-technical rubbish), which is precisely why I have my own diagrams; there are very few people who understand MS SQL Internals.

    reclaiming the space

    That is the correct term. MS does not remove deleted rows from the page, or deleted pages from the extent. Reclaiming the space is an operation that goes through the Heap and removes the unused (a) rows and (b) pages. Of course that changes the RowIds, so all Nonclustered indices have to be rebuilt.

    compression

    In the context of the pasted text: same as Reclaiming space.

    defragmentation

    the operation of full scale removal of unused space. There are three Levels:

    I. Database (AllocationUnits), across all objects

    II. Object (Extent & Page), Page Chains, Split Pages, Overflow Pages

    III. Heap Only (No Clustered index), the subject of the post

    shrinkfile

    Quite a different operation, to reduce the space allocated on a Device (File). This removes unused AllocationUnits (hence 'shrink') but it is not the same a de-fragmenting AllocationUnits.

    shrinkdatabase

    To do the same for a Database; All Devices Allocations used by the database across all Devices.

    Response to Comments

    The poster at SSC is clueless and does not address your question directly.

    • there is no such thing as a Clustered table (CREATE CLUSTERED TABLE fails)
    • there is such a thing as a Clustered index (CREATE CLUSTERED INDEX succeeds)
    • as per my diagrams, it is a single physical structure; the clustered index INCLUDES the rows and thus the Heap is eliminated
    • where there is no Clustered index, there are two physical structures: a Heap and a separate Nonclustered Index

    Now before you go diving into them with DBCC, which is too low level, and clueless folks cannot identify, let alone explain, the whys and wherefores, you need to understand and confirm the above:

    • create a Table_CI (we are intending to add a CI, there is still no such thing as a Clustered Table)
    • add an unique clustered index to it UC_PK
    • add a few rows

    • create a table Heap

    • add an unique Nonclustered index to it NC_PK
    • add a few rows

    • SELECT * FROM sysindexes WHERE id = OBJECT_ID("Table_CI")

    • SELECT * FROM sysindexes WHERE id = OBJECT_ID("Heap")

    • note that each sysindexes entry is a complete, independent, data storage structure (look at the columns)

    • contemplate the output
    • compare with my diagram
    • compare with the rubbish in the universe

    In future, I will not answer questions about the confused rubbish in the universe, and the incorrect and misinformed posts on other sites (I do not care if they are MS Certified Professionals, they have proved that they are incapable of inspecting their databases and determining the correct information)

    There is a reason I have bothered to create accurate diagrams (the manuals, pictures, and all available info fro MS, is all rubbish; no use for you to look for accurate info from the :authority", because the "authority" is technically bankrupt).

    Even Gail finally gets around to I suspect you'd benefit from more reading on overall architecture of indexes before fiddling with the low level internals.

    Except, there isn't any. That are not confusing, non-technical, and inconsistent.

    There is a reason I have bothered to create accurate diagrams.

    Back to the DBCCs. Gail is simply incorrect. In a Clustered Index (which includes the rows), the single page contains rows. Yes, rows. that is the leaf level of the index. There is a B-tree, it lives in the top of the page, but it is so small and you can't see it. Look at the sysindexes output. The root and firstpage pointer IS pointing to the page; that is the root of the Clustered Index. When you dive into the ocean, you need to know what to look for, AND where to find it, otherwise you won't find what you are looking for, and you will get distracted by the flotsam and jetsam that you do find by accident.

    Now look at the TWO SEPARATE STRUCTURES for the NCI and the Heap.

    Oh, and MS has changed from using the OAM terminology to the IAM where the data structure is an index. That introduces confusion. In terms of data structures (entries in sysindexes), they are all Objects; they might or might not be Indices). The point is, who cares, we know what it is, it is an ObjectAllocationMap ... if you are looking at at NCI, gee, it is an IndexObjectAllocationMap; if you are looking at a Heap, it is a HeapObjectAllocMap. I will let you ponder what it is in the case of a CI. In chasing it down, or in using it (finding the pages that belong to the OBJECT, it does not matter, they are all Objects. When doing that, you need to know, some objects have a PageChain and others do not (another of your questions). CIs have them; NCIs and Heaps do not.

    Gail Shaw: "I doubt these kinds of internals are documented anywhere. After all, we're using undocumented features. Definition of index depends who you ask and where you look.

    ROTFLMAO. My sides hurt, I could not read the posts that followed it. These are supposed to be intelligent human beings ? Working in the IT world ? Definitions CHANGE ? What with the temperature or the time of day ? And that was SQL Server Central ? Not the backwoods ?

    When MS stole SQL Server from Sybase, the documentation was rock solid. Of coure, with each major release, they "rewrite" it, and the docs get weaker and more fluffy (recall our discussion in another post). Now we have cute pictures that make people feel good but are inaccurate, technically. Which is why earnest people like you have problems. the pictures do not even match the text in the manuals.

    Anyway, DEFINITIONS do not change. That's the definition of definitions. They are true in any context. And Um, the um feature you are using is an ordinary, documented feature. Has been since 1987. Except MS lost it somewhere and no one can find it. You'll have to ask a Sybase Guru who was around in the old days, who remembers what exact data structures were in the code that they acquired. And if you are really lucky, he will be up to date with the differences that MoronSociety has introduced in 2000, 2005, 2008. He might even have a single accurate diagram that matches the output of sysindexes and DBCC on your box. If you find him, kiss his ring and shower him with gold. Lock up your daughters.

    (not serious, my sides are killing me, the mirth is overflowing).

    Now do you see why I will not answer questions about the confused rubbish in the universe ? There are just SO MANY morons out there in MoronSociety.

    -----

    Gail again:

    "Scans:
    An index scan is a complete read of all of the leaf pages in the index. When an index scan is done on the clustered index, it’s a table scan in all but name.
    When an index scan is done by the query processor, it is always a full read of all of the leaf pages in the index, regardless of whether all of the rows are returned. It is never a partial scan.
    A scan does not only involve reading the leaf levels of the index, the higher level pages are also read as part of the index scan."

    There must be a reason she is named after fast wind. She writes "books" ? Yeah, fantasy novels. Hot air is for balloonists not IT professionals.

    Complete and total drivel. The whole point of an Index Scan AND WHY IT IS PREFERABLE TO A TABLE SCAN, because it is trying to AVOID A TABLE SCAN, is that: - the engine (executing the query tree) can go directly to the Index (Clustered or Nonclustered, at this point) - navigate the B-Tree to find the place to start (which up to this point, is much the same as when it is getting a few rows, ie. not scanning) - the B-Tree (from any good TECHNICAL diagram) is a few pages, containing many, many index entries per page, so it is very fast - that's the root plus non-leaf levels - until it find a leaf-level entry that qualifies - from that point on, it does a SCAN, sequentially, through the LEAF level of said index (fat blue arrow)

    • now for NCIs, if you remember your homework, that means the leaf level pages are full of index_leaf_level_entry + CI_key
    • so it is scanning sequentially across the NCI Leaf level (that's why there is a PageChain only at the leaf level of NCIs, so that it can navigate across)
    • but jumping all over the place on the HEAP, to get the data rows

    • but for a CI, the leaf level IS the data row (data pages, with only data rows, that's why you cannot see an "index" in them; the non-leaf-level CI pages are pure index pages containing index_entries only)

    • so when it SCANS the index leaf_level sequentially, using the PageChain, it is SCANNING the data sequentially, they are the same operation (fat green arrow)
    • no Heap
    • no jumping around

    For comparison, then, a TABLE SCAN (MS Only): - has no PageChain on the Heap - has no choice, but to start at the beginning - and read every data page - of which many will be fragmented (contain unused space left by deleted or forwarded rows) - and others will be completely empty

    The whole intent is, the optimiser had already decided, not to go for a table (heap) scan, that it could go for an Index scan (because it required LESS than the full range of data, and it could find the starting point of that data via some index). If you look at your SHOWPLAN, even for retrieving a single unique PK row, it says "INDEX SCAN". All that means is, it will navigate the B-Tree first, to find at least one row. And then it may scan the leaf level, until it finds an end point. If it is a covered query, it never goes to the data rows.

    There is no substitute for a Clustered Index.

    0 讨论(0)
提交回复
热议问题