Bit shifting and bit mask - sample code

问题

I've come across some code which has the bit masks 0xff and 0xff00 or in 16 bit binary form 00000000 11111111 and 11111111 00000000.

/**
 * Function to check if the given string is in GZIP Format.
 *
 * @param inString String to check.
 * @return True if GZIP Compressed otherwise false.
 */
public static boolean isStringCompressed(String inString)
{
    try
    {
        byte[] bytes = inString.getBytes("ISO-8859-1");
        int gzipHeader = ((int) bytes[0] & 0xff)
            | ((bytes[1] << 8) & 0xff00);
        return GZIPInputStream.GZIP_MAGIC == gzipHeader;
    } catch (Exception e)
    {
        return false;
    }
}

I'm trying to work out what the purpose of using these bit masks in this context (against a byte array). I can't see what difference it would make?

In the context of a GZip compressed string as this method seems to be written for the GZip magic number is 35615, 8B1F in Hex and 10001011 00011111 in binary.

Am I correct in thinking this swaps the bytes? So for example say my input string were \u001f\u008b

bytes[0] & 0xff00
 bytes[0] = 1f = 00011111
          & ff = 11111111
                 --------
               = 00011111

bytes[1] << 8
 bytes[1] = 8b = 10001011
          << 8 = 10001011 00000000

((bytes[1] << 8) & 0xff00)
= 10001011 00000000 & 0xff00
= 10001011 00000000 
  11111111 00000000 &
-------------------
  10001011 00000000

00000000 00011111
10001011 00000000 |
-----------------
10001011 00011111 = 8B1F

To me it doesn't seem like the & is doing anything to the original byte in both cases bytes[0] & 0xff and (bytes[1] << 8) & 0xff00). What am I missing?

回答1:

int gzipHeader = ((int) bytes[0] & 0xff) | ((bytes[1] << 8) & 0xff00);

The type byte is Java is signed. If you cast a byte to an int, its sign will be extended. The & 0xff is to mask out the 1 bits that you get from sign extension, effectively treating the byte as if it is unsigned.

Likewise for 0xff00, except that the byte is first shifted 8 bits to the left.

So, what this does is:

take the first byte, bytes[0], cast it to int and mask out the sign-extended bits (treating the byte as if it is unsigned)
take the second byte, cast it to int, shift it left by 8 bits, and mask out the sign-extended bits
combine the values with |

Note that the shift left effectively swaps the bytes.

回答2:

This is a trick to overcome big-endian/little-endian issues. It is forcing the interpretation of the first two bytes as little-endian, i.e. [0] contains the low byte and [1] contains the high byte.

回答3:

Apparently the purpose is to read the first word of bytes and store them in gzipHeader by suitable masking and shifting. More precisely, the first part masks out exactly the first byte while the second part masks out the second byte, already shifted by 8 bits. The | combines both bit masks to an int.

The resulting value is compared against the defined value GZIPInputStream.GZIP_MAGIC to determine if the first two bytes are the defined beginning of data compressed with gzip.

回答4:

byte is a signed type. If you convert 0xff as a byte to int you get -1. If you actually want to get 255, mask after the conversion.

来源：https://stackoverflow.com/questions/30327937/bit-shifting-and-bit-mask-sample-code

标签

java

bit-manipulation