Reading binary file defined by a struct

后端 未结 5 1770
轻奢々
轻奢々 2021-02-11 01:23

Could somebody point me in the right direction of how I could read a binary file that is defined by a C struct? It has a few #define inside of the struct, which makes me thing t

相关标签:
5条回答
  • 2021-02-11 01:52

    There are some bad ideas and good ideas:

    That's a bad idea to:

    • Typecast a raw buffer into struct
      • There are endianness issues (little-endian vs big-endian) when parsing integers >1 byte long or floats
      • There are byte alignment issues in structures, which are very compiler-dependent. One can try to disable alignment (or enforce some manual alignment), but it's generally a bad idea too. At the very least, you'll ruin performance by making CPU access unaligned integers. Internal RISC core would have to do 3-4 ops instead of 1 (i.e. "do part 1 in first word", "do part 2 in second word", "merge the result") to access it every time. Or worse, compiler pragmas to control alignment will be ignored and your code will break.
      • There are no exact size guarantees for regular int, long, short, etc, type in C/C++. You can use stuff like int16_t, but these are available only on modern compilers.
      • Of course, this approach breaks completely when using structures that reference other structures: one has to unroll them all manually.
    • Write parsers manually: it's much harder than it seems on the first glance.
      • A good parser needs to do lots of sanity checking on every stage. It's easy to miss something. It is even easier to miss something if you don't use exceptions.
      • Using exceptions makes you prone to fail if your parsing code is not exception-safe (i.e. written in a way that it can be interrupted at some points and it won't leak memory / forget to finalize some objects)
      • There could be performance issues (i.e. doing lots of unbuffered IO instead of doing one OS read syscall and parsing a buffer then — or vice versa, reading whole thing at once instead of more granular, lazy reads where it's applicable).

    It's a good idea to

    • Go cross-platform. Pretty much self-explanatory, with all the mobile devices, routers and IoT stuff booming around in the recent years.
    • Go declarative. Consider using any of declarative specs to describe your structure and then use a parser generator to generate a parser.

    There are several tools available to do that:

    • Kaitai Struct — my favorite so far, cross-platform, cross-language — i.e. you describe your structure once and then you can compile it into a parser in C++, C#, Java, Python, Ruby, PHP, etc.
    • binpac — pretty dated, but still usable, C++-only — similar to Kaitai in ideology, but unsupported since 2013
    • Spicy — said to be "modern rewrite" of binpac, AKA "binpac++", but still in early stages of development; can be used for smaller tasks, C++ only too.
    0 讨论(0)
  • 2021-02-11 01:55

    You have to find out the endiannes of the machine where the file was written so you can interpret integers properly. Look out for ILP32 vs LP64 mismatch. The original structure packing/alignment might also be important.

    0 讨论(0)
  • 2021-02-11 01:56

    Using C++ I/O library:

    #include <fstream>
    using namespace std;
    
    ifstream ifs("file.dat", ios::binary);
    Format f;
    ifs.get(&f, sizeof f);
    

    Using C I/O library:

    #include <cstdio>
    using namespace std;
    
    FILE *fin = fopen("file.dat", "rb");
    Format f;
    fread(&f, sizeof f, 1, fin);
    
    0 讨论(0)
  • 2021-02-11 02:01

    You can also use unions to do this parsing if you have the data you want to parse already in memory.

    union A {
        char* buffer;
        Format format;
    };
    
    A a;
    a.buffer = stuff_you_want_to_parse;
    
    // You can now access the members of the struct through the union.
    if (a.format.str_name == "...")
        // do stuff
    

    Also remember that long could be different sizes on different platforms. If you are depending on long being a certain size, consider using the types defined int stdint.h such as uint32_t.

    0 讨论(0)
  • 2021-02-11 02:08

    Reading a binary defined by a struct is easy.

    Format myFormat;
    fread(&myFormat, sizeof(Format), 1, fp);
    

    the #defines don't affect the structure at all. (Inside is an odd place to put them, though).

    However, this is not cross-platform safe. It is the simplest thing that will possibly work, in situations where you are assured the reader and writer are using the same platform.

    The better way would be to re-define your structure as such:

    struct Format {
        Uint32 str_totalstrings;  //assuming unsigned long was 32 bits on the writer.
        Uint32 str_name;
        unsigned char stuff[4];
    };
    

    and then have a 'platform_types.h" which typedefs Uint32 correctly for your compiler. Now you can read directly into the structure, but for endianness issues you still need to do something like this:

    myFormat.str_totalstrings = FileToNative32(myFormat.str_totalstrings);
    myFormat.str_name =   FileToNative32(str_name);
    

    where FileToNative is either a no-op or a byte reverser depending on platform.

    0 讨论(0)
提交回复
热议问题