问题
In a nutshell, I programmed myself into a corner by creating a CLR aggregate that performs row id concatenation, so I say:
select SumKeys(id), name from SomeTable where name='multiple rows named this'
and I get something like:
SumKeys name
-------- ---------
1,4,495 multiple rows named this
But it dies when SumKeys gets > 8000 chars and I don't think I can do anything about it.
As a quick fix (it's only failing 1% of the time for my application) I thought I might compress the string down and I thought some of you bright people out there might know a slick way to do this.
Something like base64 made for 0-9 and a comma?
回答1:
You'd be much better of if you figure out more reasonable storage for your data (maybe HashSet)...
But for compression try regular System.IO.Compression.GZipStream ( http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx ) and convert resulting byte array to base64 string if needed... or store as byte array.
回答2:
How about a hexadecimal representation, where every digit represents a 4-bit half of a character byte (a nibble), with 0xa
used as the comma? You will only get a 50% compression, but it is fast and simple.
回答3:
Not sure how "fancy" you'd consider it, but zip/gzip compression is highly effective for any text (sometimes to the tune of 90% reduction or better). Since you're already working with C# and CLR integration, it hopefully wouldn't be too hard to setup/deploy. I haven't tinkered with any C# libraries for compression yet, but it's easy to find them. For example: http://sharpdevelop.net/OpenSource/SharpZipLib/ or http://dotnetzip.codeplex.com/ or even http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx
Or an easier option might be to switch your field to text or varchar/nvarchar(max), if that's feasible.
回答4:
You can use a Huffman tree. This is basically an algorithm to compress ascii into binary. I was told that it is basically what WinZIP uses, but I'm not sure if that is really true or not. I did a quick search for huffman coding c#
and there seems to be at least one decent implementation out there, though I haven't used any of them.
If your "vocabulary" is just digits and commas, a Hoffman tree will get you very good compression.
http://www.enusbaum.com/blog/2009/05/22/example-huffman-compression-routine-in-c/
回答5:
try:
SELECT name, GROUP_CONCAT(id) FROM SomeTable GROUP BY name WHERE name = 'multiple rows named this'
回答6:
I came across a method that will work with SQL Server:
SELECT
STUFF((
SELECT ','+id FROM SomeTable a WHERE a.name = b.name FOR XML PATH('')
),1,1,'') AS SumKeys, name
FROM SomeTable b
GROUP BY name
WHERE name = 'multiple rows named this'
The WHERE clause is optional
来源:https://stackoverflow.com/questions/6023117/is-there-a-fast-and-non-fancy-c-sharp-code-algorithm-to-compress-a-string-of-com