问题
I have been experimenting with compression in SQL Server but so far I have not seen the results that I expected.
To test I have created a new table with single VARCHAR(8000)
column and inserted 100k rows into it. Each row contains about 500 words of text, which using ZIP compression sees over a 90% saving in space.
I am using the command EXEC sp_estimate_data_compression_savings 'dbo', 'MyTable', NULL, NULL, 'PAGE' ;
to check how much space would be saved using PAGE compression, but it is telling me that there won't be much at all. The results are as follows:
object_name schema_name index_id partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB) sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB)
MyTable dbo 0 1 94048 93440 40064 39808
Which is basically no saving at all. Where am I going wrong?
ps. I have tried the same experiment with NVARCHAR(4000)
column, and compression does show savings there, but I believe this is because the compression forcing use of 1 char instead of two where the data doesn't require 2 chars. It doesn't actually compress the data in a way similar to ZIP would.
回答1:
If the data is pushed off-row (which will likely happen on a VARCHAR(8000)
column) then you don't get any compression on it. Only the in-row data is compressed:
Because of their size, large-value data types are sometimes stored separately from the normal row data on special purpose pages. Data compression is not available for the data that is stored separately.
回答2:
Page compression in SQL server uses prefix and dictionary methods to compress the data. It cannot (and you would not want it to) look at the entire data set to figure out the best compression. It can only look at a page of data at a time. The best results are achieved when each successive row in the page differs the least amount from the previous rows. The only way to accomplish this is cause SQL server to physically arrange the rows in each page so that they differ in the least possible degree from row to row. We can do this by creating a clustered index on the field, or set of fields, that guarantee that the physical arrangement of the data rows follow the least change from row to row model.
In the example you provided, a bunch of words in a single field, a suitable degree of compression may mot be achievable. This sounds like paragraphs of text, and will differ greatly, no matter how they are physically arranged.
The method that SQL server uses to compress data enables it to retrieve the contents of any row without having to decompress the entire page.
来源:https://stackoverflow.com/questions/9790554/compressing-varchar-in-sql-2008-12-not-seeing-results