Understanding Java bytes

后端 未结 6 871
抹茶落季
抹茶落季 2020-12-31 18:48

So at work yesterday, I had to write an application to count the pages in an AFP file. So I dusted off my MO:DCA spec PDF and found the structured field BPG (Begin Pag

相关标签:
6条回答
  • 2020-12-31 19:04

    Not sure what you really want :) I assume you are asking how to extract a signed multi-byte value? First, look at what happens when you sign extend a single byte:

    byte[] b = new byte[] { -128 };
    int i = b[0];
    System.out.println(i); // prints -128!
    

    So, the sign is correctly extendet to 32 bits without doing anything special. The byte 1000 0000 extends correctly to 1111 1111 1111 1111 1111 1111 1000 0000. You already know how to suppress sign extension by AND'ing with 0xFF - for multi byte values, you want only the sign of the most significant byte to be extendet, and the less significant bytes you want to treat as unsigned (example assumes network byte order, 16-bit int value):

    byte[] b = new byte[] { -128, 1 }; // 0x80, 0x01
    int i = (b[0] << 8) | (b[1] & 0xFF);
    System.out.println(i); // prints -32767!
    System.out.println(Integer.toHexString(i)); // prints ffff8001
    

    You need to suppress the sign extension of every byte except the most significant one, so to extract a signed 32-bit int to a 64-bit long:

    byte[] b = new byte[] { -54, -2, -70, -66 }; // 0xca, 0xfe, 0xba, 0xbe
    long l = ( b[0]         << 24) |
             ((b[1] & 0xFF) << 16) |
             ((b[2] & 0xFF) <<  8) |
             ((b[3] & 0xFF)      );
    System.out.println(l); // prints -889275714
    System.out.println(Long.toHexString(l)); // prints ffffffffcafebabe
    

    Note: on intel based systems, bytes are often stored in reverse order (least significant byte first) because the x86 architecture stores larger entities in this order in memory. A lot of x86 originated software does use it in file formats, too.

    0 讨论(0)
  • 2020-12-31 19:04

    To get the unsigned byte value you can either.

    int u = b & 0xFF;
    

    or

    int u = b < 0 ? b + 256 : b;
    
    0 讨论(0)
  • 2020-12-31 19:13

    What I want to know is how the bitwise operation works here--more specifically, how I arrive at the binary representation for a negative number.

    The binary representation of a negative number is that of the corresponding positive number bit-flipped with 1 added to it. This representation is called two's complement.

    0 讨论(0)
  • 2020-12-31 19:13

    I guess the magic here is that the byte is stored in a bigger container, likely a 32 bit int. And if the byte was interpreted as being a signed byte it gets expanded to represent the same number in the 32 bit int, that is if the most significant bit (the first one) of the byte is a 1 then in the 32 bit int all the bits left of that 1 are also turned to 1 (that's due to the way negative numbers are represented, two's complement).

    Now, if you & 0xFF that int you cut off those 1's and end up with a "positive" int representing the byte value you've read.

    0 讨论(0)
  • 2020-12-31 19:21

    In order to obtain the binary representation of a negative number you calculate two's complement:

    • Get the binary representation of the positive number
    • Invert all the bits
    • Add one

    Let's do -72 as an example:

    0100 1000    72
    1011 0111    All bits inverted
    1011 1000    Add one
    

    So the binary (8-bit) representation of -72 is 10111000.

    What is actually happening to you is the following: You file has a byte with value 10111000. When interpreted as an unsigned byte (which is probably what you want), this is 88.

    In Java, when this byte is used as an int (for example because read() returns an int, or because of implicit promotion), it will be interpreted as a signed byte, and sign-extended to 11111111 11111111 11111111 10111000. This is an integer with value -72.

    By ANDing with 0xff you retain only the lowest 8 bits, so your integer is now 00000000 00000000 00000000 10111000, which is 88.

    0 讨论(0)
  • 2020-12-31 19:28

    For bytes with bit 7 set:

    unsigned_value = signed_value + 256
    

    Mathematically when you compute with bytes you compute modulo 256. The difference between signed and unsigned is that you choose different representatives for the equivalence classes, while the underlying representation as a bit pattern stays the same for each equivalence class. This also explains why addition, subtraction and multiplication have the same result as a bit pattern, regardless of whether you compute with signed or unsigned integers.

    0 讨论(0)
提交回复
热议问题