C#.net identify zip file

前端 未结 6 435
栀梦
栀梦 2020-12-03 10:45

I am currently using the SharpZip api to handle my zip file entries. It works splendid for zipping and unzipping. Though, I am having trouble identifying if a file is a zip

相关标签:
6条回答
  • 2020-12-03 11:20

    If you are programming for Web, you can check the file Content Type: application/zip

    0 讨论(0)
  • 2020-12-03 11:20

    I used https://en.wikipedia.org/wiki/List_of_file_signatures, just adding an extra byte on for my zip files, to differentiate between my zip files and Word documents (these share the first four bytes).

    Here is my code:

    public class ZipFileUtilities
    {
        private static readonly byte[] ZipBytes1 = { 0x50, 0x4b, 0x03, 0x04, 0x0a };
        private static readonly byte[] GzipBytes = { 0x1f, 0x8b };
        private static readonly byte[] TarBytes = { 0x1f, 0x9d };
        private static readonly byte[] LzhBytes = { 0x1f, 0xa0 };
        private static readonly byte[] Bzip2Bytes = { 0x42, 0x5a, 0x68 };
        private static readonly byte[] LzipBytes = { 0x4c, 0x5a, 0x49, 0x50 };
        private static readonly byte[] ZipBytes2 = { 0x50, 0x4b, 0x05, 0x06 };
        private static readonly byte[] ZipBytes3 = { 0x50, 0x4b, 0x07, 0x08 };
    
        public static byte[] GetFirstBytes(string filepath, int length)
        {
            using (var sr = new StreamReader(filepath))
            {
                sr.BaseStream.Seek(0, 0);
                var bytes = new byte[length];
                sr.BaseStream.Read(bytes, 0, length);
    
                return bytes;
            }
        }
    
        public static bool IsZipFile(string filepath)
        {
            return IsCompressedData(GetFirstBytes(filepath, 5));
        }
    
        public static bool IsCompressedData(byte[] data)
        {
            foreach (var headerBytes in new[] { ZipBytes1, ZipBytes2, ZipBytes3, GzipBytes, TarBytes, LzhBytes, Bzip2Bytes, LzipBytes })
            {
                if (HeaderBytesMatch(headerBytes, data))
                    return true;
            }
    
            return false;
        }
    
        private static bool HeaderBytesMatch(byte[] headerBytes, byte[] dataBytes)
        {
            if (dataBytes.Length < headerBytes.Length)
                throw new ArgumentOutOfRangeException(nameof(dataBytes), 
                    $"Passed databytes length ({dataBytes.Length}) is shorter than the headerbytes ({headerBytes.Length})");
    
            for (var i = 0; i < headerBytes.Length; i++)
            {
                if (headerBytes[i] == dataBytes[i]) continue;
    
                return false;
            }
    
            return true;
        }
    
     }
    

    There may be better ways to code this particularly the byte compare, but as its a variable length byte compare (depending on the signature being checked), I felt at least this code is readable - to me at least.

    0 讨论(0)
  • 2020-12-03 11:21

    View https://stackoverflow.com/a/16587134/206730 reference

    Check the below links:

    icsharpcode-sharpziplib-validate-zip-file

    How-to-check-if-a-file-is-compressed-in-c#

    ZIP files always start with 0x04034b50 (4 bytes)
    View more: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers

    Sample usage:

            bool isPKZip = IOHelper.CheckSignature(pkg, 4, IOHelper.SignatureZip);
            Assert.IsTrue(isPKZip, "Not ZIP the package : " + pkg);
    
    // http://blog.somecreativity.com/2008/04/08/how-to-check-if-a-file-is-compressed-in-c/
        public static partial class IOHelper
        {
            public const string SignatureGzip = "1F-8B-08";
            public const string SignatureZip = "50-4B-03-04";
    
            public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
            {
                if (String.IsNullOrEmpty(filepath)) throw new ArgumentException("Must specify a filepath");
                if (String.IsNullOrEmpty(expectedSignature)) throw new ArgumentException("Must specify a value for the expected file signature");
                using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
                {
                    if (fs.Length < signatureSize)
                        return false;
                    byte[] signature = new byte[signatureSize];
                    int bytesRequired = signatureSize;
                    int index = 0;
                    while (bytesRequired > 0)
                    {
                        int bytesRead = fs.Read(signature, index, bytesRequired);
                        bytesRequired -= bytesRead;
                        index += bytesRead;
                    }
                    string actualSignature = BitConverter.ToString(signature);
                    if (actualSignature == expectedSignature) return true;
                    return false;
                }
            }
    
        }
    
    0 讨论(0)
  • 2020-12-03 11:24

    This is a base class for a component that needs to handle data that is either uncompressed, PKZIP compressed (sharpziplib) or GZip compressed (built in .net). Perhaps a bit more than you need but should get you going. This is an example of using @PhonicUK's suggestion to parse the header of the data stream. The derived classes you see in the little factory method handled the specifics of PKZip and GZip decompression.

    abstract class Expander
    {
        private const int ZIP_LEAD_BYTES = 0x04034b50;
        private const ushort GZIP_LEAD_BYTES = 0x8b1f;
    
        public abstract MemoryStream Expand(Stream stream); 
        
        internal static bool IsPkZipCompressedData(byte[] data)
        {
            Debug.Assert(data != null && data.Length >= 4);
            // if the first 4 bytes of the array are the ZIP signature then it is compressed data
            return (BitConverter.ToInt32(data, 0) == ZIP_LEAD_BYTES);
        }
    
        internal static bool IsGZipCompressedData(byte[] data)
        {
            Debug.Assert(data != null && data.Length >= 2);
            // if the first 2 bytes of the array are theG ZIP signature then it is compressed data;
            return (BitConverter.ToUInt16(data, 0) == GZIP_LEAD_BYTES);
        }
    
        public static bool IsCompressedData(byte[] data)
        {
            return IsPkZipCompressedData(data) || IsGZipCompressedData(data);
        }
    
        public static Expander GetExpander(Stream stream)
        {
            Debug.Assert(stream != null);
            Debug.Assert(stream.CanSeek);
            stream.Seek(0, 0);
    
            try
            {
                byte[] bytes = new byte[4];
    
                stream.Read(bytes, 0, 4);
    
                if (IsGZipCompressedData(bytes))
                    return new GZipExpander();
    
                if (IsPkZipCompressedData(bytes))
                    return new ZipExpander();
    
                return new NullExpander();
            }
            finally
            {
                stream.Seek(0, 0);  // set the stream back to the begining
            }
        }
    }
    
    0 讨论(0)
  • 2020-12-03 11:33

    You can either:

    • Use a try-catch structure and try to read the structure of a potential zip file
    • Parse the file header to see if it is a zip file

    ZIP files always start with 0x04034b50 as its first 4 bytes ( http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers )

    0 讨论(0)
  • 2020-12-03 11:37

    Thanks to dkackman and Kiquenet for answers above. For completeness, the below code uses the signature to identify compressed (zip) files. You then have the added complexity that the newer MS Office file formats will also return match this signature lookup (your .docx and .xlsx files etc). As remarked upon elsewhere, these are indeed compressed archives, you can rename the files with a .zip extension and have a look at the XML inside.

    Below code, first does a check for ZIP (compressed) using the signatures used above, and we then have a subsequent check for the MS Office packages. Note that to use the System.IO.Packaging.Package you need a project reference to "WindowsBase" (that is a .NET assembly reference).

        private const string SignatureZip = "50-4B-03-04";
        private const string SignatureGzip = "1F-8B-08";
    
        public static bool IsZip(this Stream stream)
        {
            if (stream.Position > 0)
            {
                stream.Seek(0, SeekOrigin.Begin);
            }
    
            bool isZip = CheckSignature(stream, 4, SignatureZip);
            bool isGzip = CheckSignature(stream, 3, SignatureGzip);
    
            bool isSomeKindOfZip = isZip || isGzip;
    
            if (isSomeKindOfZip && stream.IsPackage()) //Signature matches ZIP, but it's package format (docx etc).
            {
                return false;
            }
    
            return isSomeKindOfZip;
        }
    
        /// <summary>
        /// MS .docx, .xslx and other extensions are (correctly) identified as zip files using signature lookup.
        /// This tests if System.IO.Packaging is able to open, and if package has parts, this is not a zip file.
        /// </summary>
        /// <param name="stream"></param>
        /// <returns></returns>
        private static bool IsPackage(this Stream stream)
        {
            Package package = Package.Open(stream, FileMode.Open, FileAccess.Read);
            return package.GetParts().Any();
        }
    
    0 讨论(0)
提交回复
热议问题