Parsing a binary file. What is a modern way?

后端 未结 10 2024
悲哀的现实
悲哀的现实 2021-01-30 01:41

I have a binary file with some layout I know. For example let format be like this:

  • 2 bytes (unsigned short) - length of a string
  • 5 bytes (5 x chars) - the
10条回答
  •  盖世英雄少女心
    2021-01-30 02:03

    I actually implemented a quick and dirty binary format parser to read .zip files (following Wikipedia's format description) just last month, and being modern I decided to use C++ templates.

    On some specific platforms, a packed struct could work, however there are things it does not handle well... such as fields of variable length. With templates, however, there is no such issue: you can get arbitrarily complex structures (and return types).

    A .zip archive is relatively simple, fortunately, so I implemented something simple. Off the top of my head:

    using Buffer = std::pair;
    
    template 
    class UInt16LEReader: private OffsetReader {
    public:
        UInt16LEReader() {}
        explicit UInt16LEReader(OffsetReader const or): OffsetReader(or) {}
    
        uint16_t read(Buffer const& buffer) const {
            OffsetReader const& or = *this;
    
            size_t const offset = or.read(buffer);
            assert(offset <= buffer.second && "Incorrect offset");
            assert(offset + 2 <= buffer.second && "Too short buffer");
    
            unsigned char const* begin = buffer.first + offset;
    
            // http://commandcenter.blogspot.fr/2012/04/byte-order-fallacy.html
            return (uint16_t(begin[0]) << 0)
                 + (uint16_t(begin[1]) << 8);
        }
    }; // class UInt16LEReader
    
    // Declined for UInt[8|16|32][LE|BE]...
    

    Of course, the basic OffsetReader actually has a constant result:

    template 
    class FixedOffsetReader {
    public:
        size_t read(Buffer const&) const { return O; }
    }; // class FixedOffsetReader
    

    and since we are talking templates, you can switch the types at leisure (you could implement a proxy reader which delegates all reads to a shared_ptr which memoizes them).

    What is interesting, though, is the end-result:

    // http://en.wikipedia.org/wiki/Zip_%28file_format%29#File_headers
    class LocalFileHeader {
    public:
        template 
        using UInt32 = UInt32LEReader>;
        template 
        using UInt16 = UInt16LEReader>;
    
        UInt32< 0> signature;
        UInt16< 4> versionNeededToExtract;
        UInt16< 6> generalPurposeBitFlag;
        UInt16< 8> compressionMethod;
        UInt16<10> fileLastModificationTime;
        UInt16<12> fileLastModificationDate;
        UInt32<14> crc32;
        UInt32<18> compressedSize;
        UInt32<22> uncompressedSize;
    
        using FileNameLength = UInt16<26>;
        using ExtraFieldLength = UInt16<28>;
    
        using FileName = StringReader, FileNameLength>;
    
        using ExtraField = StringReader<
            CombinedAdd, FileNameLength>,
            ExtraFieldLength
        >;
    
        FileName filename;
        ExtraField extraField;
    }; // class LocalFileHeader
    

    This is rather simplistic, obviously, but incredibly flexible at the same time.

    An obvious axis of improvement would be to improve chaining since here there is a risk of accidental overlaps. My archive reading code worked the first time I tried it though, which was evidence enough for me that this code was sufficient for the task at hand.

提交回复
热议问题