Parsing a binary file. What is a modern way?

后端 未结 10 2026
悲哀的现实
悲哀的现实 2021-01-30 01:41

I have a binary file with some layout I know. For example let format be like this:

  • 2 bytes (unsigned short) - length of a string
  • 5 bytes (5 x chars) - the
10条回答
  •  孤城傲影
    2021-01-30 02:10

    The C way, which would work fine in C++, would be to declare a struct:

    #pragma pack(1)
    
    struct contents {
       // data members;
    };
    

    Note that

    • You need to use a pragma to make the compiler align the data as-it-looks in the struct;
    • This technique only works with POD types

    And then cast the read buffer directly into the struct type:

    std::vector buf(sizeof(contents));
    file.read(buf.data(), buf.size());
    contents *stuff = reinterpret_cast(buf.data());
    

    Now if your data's size is variable, you can separate in several chunks. To read a single binary object from the buffer, a reader function comes handy:

    template
    const char *read_object(const char *buffer, T& target) {
        target = *reinterpret_cast(buffer);
        return buffer + sizeof(T);
    }
    

    The main advantage is that such a reader can be specialized for more advanced c++ objects:

    template
    const char *read_object(const char *buffer, std::vector& target) {
        size_t size = target.size();
        CT const *buf_start = reinterpret_cast(buffer);
        std::copy(buf_start, buf_start + size, target.begin());
        return buffer + size * sizeof(CT);
    }
    

    And now in your main parser:

    int n_floats;
    iter = read_object(iter, n_floats);
    std::vector my_floats(n_floats);
    iter = read_object(iter, my_floats);
    

    Note: As Tony D observed, even if you can get the alignment right via #pragma directives and manual padding (if needed), you may still encounter incompatibility with your processor's alignment, in the form of (best case) performance issues or (worst case) trap signals. This method is probably interesting only if you have control over the file's format.

提交回复
热议问题