Is there a fast and non-fancy C# code/algorithm to compress a string of comma separated digits close to maximum info density?

亡梦爱人 提交于 2019-12-06 08:09:13

You'd be much better of if you figure out more reasonable storage for your data (maybe HashSet)...

But for compression try regular System.IO.Compression.GZipStream ( http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx ) and convert resulting byte array to base64 string if needed... or store as byte array.

How about a hexadecimal representation, where every digit represents a 4-bit half of a character byte (a nibble), with 0xa used as the comma? You will only get a 50% compression, but it is fast and simple.

Not sure how "fancy" you'd consider it, but zip/gzip compression is highly effective for any text (sometimes to the tune of 90% reduction or better). Since you're already working with C# and CLR integration, it hopefully wouldn't be too hard to setup/deploy. I haven't tinkered with any C# libraries for compression yet, but it's easy to find them. For example: http://sharpdevelop.net/OpenSource/SharpZipLib/ or http://dotnetzip.codeplex.com/ or even http://msdn.microsoft.com/en-us/library/system.io.compression.gzipstream.aspx

Or an easier option might be to switch your field to text or varchar/nvarchar(max), if that's feasible.

You can use a Huffman tree. This is basically an algorithm to compress ascii into binary. I was told that it is basically what WinZIP uses, but I'm not sure if that is really true or not. I did a quick search for huffman coding c# and there seems to be at least one decent implementation out there, though I haven't used any of them.

If your "vocabulary" is just digits and commas, a Hoffman tree will get you very good compression.

http://www.enusbaum.com/blog/2009/05/22/example-huffman-compression-routine-in-c/

try:

SELECT name, GROUP_CONCAT(id) FROM SomeTable GROUP BY name WHERE name = 'multiple rows named this'

I came across a method that will work with SQL Server:

SELECT
  STUFF((
    SELECT ','+id FROM SomeTable a WHERE a.name = b.name FOR XML PATH('')
  ),1,1,'') AS SumKeys, name
FROM SomeTable b
GROUP BY name
WHERE name = 'multiple rows named this'

The WHERE clause is optional

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!