问题
I have gzipped data which is stored in DB. Is there a way to concatenate say 50 separate gzipped data into one gzipped output which can be uncompressed? The result should be same as decompressing that 50 items, concatenating them and then gzipping them.
I would like to avoid decompression phase. Is there also some performance benefit of merging already gzipped data instead gzipping whole byte array?
回答1:
Yes, you can concatenate gzip streams, which when decompressed give you the same thing as if you had concatenated the uncompressed data and gzipped it all at once. Specifically:
gzip a
gzip b
cat a.gz b.gz > c.gz
gunzip c.gz
will give you the same c
as:
cat a b > c
However compression will be degraded as compared to gzipping the whole thing at once, especially if each of your 50 pieces are small, e.g. less than several 10's of K bytes. The compressed result will always be different, and a little or a lot larger depending on the size of the pieces.
The comment in another answer about GZIPStream should be heeded. I also recommend that you use DotNetZip instead.
回答2:
I would assume that merely concatenating any file in a zipped format would prove disastrous as the zipping algorithm has been run on the specific content per file. I think that you would have to manually unzip all, concatenate, then zip again.
回答3:
GZip is buggy, moreso decompressing a gzip file which itself has multiple gzip members is buggy... Not all of gzips bugs have been ironed out even in .net 4.5
Furthermore consider what machine each gzip was created on, i.e. is it a BGZF "Blocked GNU Zip Format"? which complicates the issue at hand.
Furthermore the resulting gzip file can be bigger than if you had concatenated all the uncompressed individual files together (gzip isn't a very good compression algorithm set).
I recommend you use DotNetZip instead if it isn't too late.
GZipStream is not really built to handle multiple files, however you can use System.IO.BinaryWriter and System.IO.BinaryReader to gain full control, although it can get messy. DotNetZip just works! It is designed to handle multiple files.
P.S. GZipStream works for file sizes up to 8GB with .Net 4, although earlier versions have a lower limit, e.g. GZipStream works for file sizes up to 4GB with .Net 3.5
来源:https://stackoverflow.com/questions/15656336/concatenate-gzipped-byte-arrays-in-c-sharp