Why does fread mess with my byte order?

后端 未结 3 1453
无人共我
无人共我 2020-12-25 08:25

Im trying to parse a bmp file with fread() and when I begin to parse, it reverses the order of my bytes.

typedef struct{
    short magic_number;         


        
相关标签:
3条回答
  • 2020-12-25 08:59

    Writing a struct to a file is highly non-portable -- it's safest to just not try to do it at all. Using a struct like this is guaranteed to work only if a) the struct is both written and read as a struct (never a sequence of bytes) and b) it's always both written and read on the same (type of) machine. Not only are there "endian" issues with different CPUs (which is what it seems you've run into), there are also "alignment" issues. Different hardware implementations have different rules about placing integers only on even 2-byte or even 4-byte or even 8-byte boundaries. The compiler is fully aware of all this, and inserts hidden padding bytes into your struct so it always works right. But as a result of the hidden padding bytes, it's not at all safe to assume a struct's bytes are laid out in memory like you think they are. If you're very lucky, you work on a computer that uses big-endian byte order and has no alignment restrictions at all, so you can lay structs directly over files and have it work. But you're probably not that lucky -- certainly programs that need to be "portable" to different machines have to avoid trying to lay structs directly over any part of any file.

    0 讨论(0)
  • 2020-12-25 09:04

    This is not the fault of fread, but of your CPU, which is (apparently) little-endian. That is, your CPU treats the first byte in a short value as the low 8 bits, rather than (as you seem to have expected) the high 8 bits.

    Whenever you read a binary file format, you must explicitly convert from the file format's endianness to the CPU's native endianness. You do that with functions like these:

    /* CHAR_BIT == 8 assumed */
    uint16_t le16_to_cpu(const uint8_t *buf)
    {
       return ((uint16_t)buf[0]) | (((uint16_t)buf[1]) << 8);
    }
    uint16_t be16_to_cpu(const uint8_t *buf)
    {
       return ((uint16_t)buf[1]) | (((uint16_t)buf[0]) << 8);
    }
    

    You do your fread into an uint8_t buffer of the appropriate size, and then you manually copy all the data bytes over to your BMPHeader struct, converting as necessary. That would look something like this:

    /* note adjustments to type definition */
    typedef struct BMPHeader
    {
        uint8_t magic_number[2];
        uint32_t file_size;
        uint8_t reserved[4];
        uint32_t data_offset;
    } BMPHeader;
    
    /* in general this is _not_ equal to sizeof(BMPHeader) */
    #define BMP_WIRE_HDR_LEN (2 + 4 + 4 + 4)
    
    /* returns 0=success, -1=error */
    int read_bmp_header(BMPHeader *hdr, FILE *fp)
    {
        uint8_t buf[BMP_WIRE_HDR_LEN];
    
        if (fread(buf, 1, sizeof buf, fp) != sizeof buf)
            return -1;
    
        hdr->magic_number[0] = buf[0];
        hdr->magic_number[1] = buf[1];
    
        hdr->file_size = le32_to_cpu(buf+2);
    
        hdr->reserved[0] = buf[6];
        hdr->reserved[1] = buf[7];
        hdr->reserved[2] = buf[8];
        hdr->reserved[3] = buf[9];
    
        hdr->data_offset = le32_to_cpu(buf+10);
    
        return 0;
    }
    

    You do not assume that the CPU's endianness is the same as the file format's even if you know for a fact that right now they are the same; you write the conversions anyway, so that in the future your code will work without modification on a CPU with the opposite endianness.

    You can make life easier for yourself by using the fixed-width <stdint.h> types, by using unsigned types unless being able to represent negative numbers is absolutely required, and by not using integers when character arrays will do. I've done all these things in the above example. You can see that you need not bother endian-converting the magic number, because the only thing you need to do with it is test magic_number[0]=='B' && magic_number[1]=='M'.

    Conversion in the opposite direction, btw, looks like this:

    void cpu_to_le16(uint8_t *buf, uint16_t val)
    {
       buf[0] = (val & 0x00FF);
       buf[1] = (val & 0xFF00) >> 8;
    }
    void cpu_to_be16(uint8_t *buf, uint16_t val)
    {
       buf[0] = (val & 0xFF00) >> 8;
       buf[1] = (val & 0x00FF);
    }
    

    Conversion of 32-/64-bit quantities left as an exercise.

    0 讨论(0)
  • 2020-12-25 09:10

    I assume this is an endian issue. i.e. You are putting the bytes 42 and 4D into your short value. But your system is little endian (I could have the wrong name), which actually reads the bytes (within a multi-byte integer type) left to right instead of right to left.

    Demonstrated in this code:

    #include <stdio.h>
    
    int main()
    {
        union {
            short sval;
            unsigned char bval[2];
        } udata;
        udata.sval = 1;
        printf( "DEC[%5hu]  HEX[%04hx]  BYTES[%02hhx][%02hhx]\n"
              , udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
        udata.sval = 0x424d;
        printf( "DEC[%5hu]  HEX[%04hx]  BYTES[%02hhx][%02hhx]\n"
              , udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
        udata.sval = 0x4d42;
        printf( "DEC[%5hu]  HEX[%04hx]  BYTES[%02hhx][%02hhx]\n"
              , udata.sval, udata.sval, udata.bval[0], udata.bval[1] );
        return 0;
    }
    

    Gives the following output

    DEC[    1]  HEX[0001]  BYTES[01][00]
    DEC[16973]  HEX[424d]  BYTES[4d][42]
    DEC[19778]  HEX[4d42]  BYTES[42][4d]
    

    So if you want to be portable you will need to detect the endian-ness of your system and then do a byte shuffle if required. There will be plenty of examples round the internet of swapping the bytes around.

    Subsequent question:

    I ask only because my file size is 3 instead of 196662

    This is due to memory alignment issues. 196662 is the bytes 36 00 03 00 and 3 is the bytes 03 00 00 00. Most systems need types like int etc to not be split over multiple memory words. So intuitively you think your struct is laid out im memory like:

                              Offset
    short magic_number;       00 - 01
    int file_size;            02 - 05
    short reserved_bytes[2];  06 - 09
    int data_offset;          0A - 0D
    

    BUT on a 32 bit system that means files_size has 2 bytes in the same word as magic_number and two bytes in the next word. Most compilers will not stand for this, so the way the structure is laid out in memory is actually like:

    short magic_number;       00 - 01
    <<unused padding>>        02 - 03
    int file_size;            04 - 07
    short reserved_bytes[2];  08 - 0B
    int data_offset;          0C - 0F
    

    So when you read your byte stream in the 36 00 is going into your padding area which leaves your file_size as getting the 03 00 00 00. Now if you used fwrite to create this data it should have been OK as the padding bytes would have been written out. But if your input is always going to be in the format you have specified it is not appropriate to read the whole struct as one with fread. Instead you will need to read each of the elements individually.

    0 讨论(0)
提交回复
热议问题