What is an intuitive way to interpret the bitwise operators and masking? Also, what is masking used for?

问题

I'm learning about bitwise operators and masking right now in my computer systems class. However I'm having some trouble internalizing them.

I understand what the operators, &, |, ^, >> (both arithmetic and logical shift), and << DO, but I don't quite get what they're really used for aside from optimizing multiplication and division operations (for >> and <<), and to check if certain bits are on or off (the & operator).

Also, I don't understand what masking is used for. I know that doing x & 0xFF is used to extract the least significant bit in an integer x, but I can't really extrapolate from that to how other kinds of masks (e.g. those that extract the leftmost 1 in a number, that obtain the number of 1s in a number, etc.) are used?

Could anyone please shed some light on this, preferably with some examples? Thank you.

回答1:

A good way to understand bitmasks is with an example so I will give one. Lets say we have an array of structs:

struct my_struct {
  int foo;
  int bar;
};

struct my_struct array_of_structs[64]; // I picked 64 for a specific reason I will discuss later

We will use this array as a pool and the elements of this array are allocated as needed and can also be deallocated. One way to accomplish this is to add a used boolean member in the struct.

struct my_struct {
  int foo;
  int bar;
  char used;
};

But another way is to create a bitmap. Since the array is of size 64, we only need a single 64 bit integer for this. Note that you can do this with an array of elements for the bitmap if you have more elements than you do bits in a single data type but I will omit this for the sake of clarity.

 unsigned long long bitmap;  // Guaranteed to be at least 64 bits (if I recall correctly)

So lets let every bit correspond to an element in our array, a 0 for that bit means not used and a 1 means used. Therefore to mark element i as used we would do the following:

bitmap = bitmap | (1ULL << i);

or to be more concise:

bitmap |= (1ULL << i);

(1ULL << i) has every bit set to 0 except the ith bit so bitmap | (1ULL << i) is the same as the bitmap except the ith bit is set as well (regardless of what it previously was). What bitmap |= (1ULL << i); is doing is basically saying I want to set the ith bit to 1 and leave everything else the way it was. The ith bit here is used to represent whether the ith object is free or not so what another way to interpret this is I want to mark the ith element as used.

Now to test if element i is used or not we would use &:

if(bitmap & (1ULL << i)) {
  // i is used
}
else {
  // i is not used
}

bitmap & (1ULL << i) will only be a non-zero value, and therefore true, if the ith bit is also set in bitmap.

Finally to mark an element as not used we would do the following:

bitmap = bitmap & ~(1ULL << i);

or again to be more concise

bitmap &= ~(1ULL << i);

~(1ULL << i) will be 64 bits (assuming unsigned long long is 64 bits) with every bit set to 1 except with ith bit. When it is anded with bitmap the result is the exact same as bitmap except the ith bit will be set to 0.

One might wonder when to use a bitmap vs a used variable. In some cases a bitmap could be faster although it could also be slower and I would say you have to test which works for your application, if this part ever become a real bottleneck. One example I can give of using a bitmap to mark things as used or not used is when you don't have a custom struct. Specifically, from my own experience, I use a bitmap in my operating system to mark physical frames as used or not used. Since there is not struct, as what I am marking is memory itself, a bitmap works. However, this is not the most efficient in terms of finding free frames but it works.

Another common use is to mark whether a property or attribute is present. This is commonly referred to as a bit flag. For example lets say we have a player struct with a flag element.

struct player {
  // some members
  unsigned long long flag;
};

We then might have various properties about this player, such as is jumping, is swimming, is running, is dead. We could create bitmasks that correspond to each property.

#define PLAYER_JUMPING  (1ULL << 0)
#define PLAYER_SWIMMING (1ULL << 1)
#define PLAYER_RUNNING  (1ULL << 2)
#define PLAYER_DEAD     (1ULL << 3)

Then use the similar operations already demonstrated to toggle properties on and off.

struct player my_player;
my_player.flag |= PLAYER_JUMPING; // Mark the player as jumping
my_player.flag &= ~PLAYER_SWIMMING; // Mark the player as not swimming
if(my_player.flag & PLAYER_RUNNING)  // Test if the player is running

Finally, the one operation I didn't demonstrate before was the bitwise exclusive or: ^. You can use this to toggle a property.

my_player.flag = my_player.flag ^ PLAYER_DEAD; // If player was dead now player is not dead and vise versa

or again to be more concise:

my_player.flag ^= PLAYER_DEAD; // If player was dead now player is not dead and vise versa

This will affect only the specific bit set in the bitmask and all the others will be left to their previous value, i.e. x ^ 0 == x at the bit level.

When using bitmasks this way you can test multiple properties with one bitwise and. For example if you only care if the player is running or jumping you could do the following:

if(my_player.flag & (PLAYER_RUNNING | PLAYER_JUMPING))  // Test if the player is running or jumping

Note that almost every compiler will convert (PLAYER_RUNNING | PLAYER_JUMPING) to a single constant so this reduces the number of operations since only one member of the struct is checked as opposed to two.

回答2:

Following the other excellent answer, there is one more area to add. masks. What is a mask? It sounds impressive, so what is it? Well, it's just a number... The important part -- is what the number represents. Generally, when you think of a mask, a number, you are thinking about what the mask tells you about the state of the individual bits that make up that number.

Consider an unsigned integer which is 4-bytes of information (well, generally 4-bytes on x86 machines). All are familiar with the common mask 0xff used to isolate the low-byte of that unsigned int. Take for example:

ui = 73289  (00000000-00000001-00011110-01001001)

byte0 =  ui & 0xff;    //  0xff = binary 11111111  (255)

byte0 :  73  (01001001)

So above, the mask or 0xff is nothing more than the number 255 which just happens to have the binary representation of 11111111 which insures when anded with the number ui will give the first 8-bits (or byte) of ui.

If the mask you are interested in changes frequently or doesn't have a nice readable hex equivalent like 0xff, it is useful to simply assign the number representing the bits at interest to a variable. That variable is then referred to as a bitmask. (nothing more, nothing less, just a number assigned to a variable that happens to represent the bit-state you are interested in).

With that, let's look again at the example given by esm:

#define PLAYER_JUMPING  (1U << 0)
#define PLAYER_SWIMMING (1U << 1)
#define PLAYER_RUNNING  (1U << 2)
#define PLAYER_DEAD     (1U << 3)

To find which players are both running and jumping, we would need to do something like:

if ((player[x] & (PLAYER_RUNNING | PLAYER_JUMPING)) == 
    (PLAYER_RUNNING | PLAYER_JUMPING))
    printf ("player[x] is running & jumping");

That is a mess to type and a mess to read, but it can be made more managable if we assing the result of PLAYER_RUNNING | PLAYER_JUMPING to a variable and use that variable (our bitmask) for the test. E.g:

mask_rj = PLAYER_RUNNING | PLAYER_JUMPING;  // its value is '5' or (00000101)

if ((player[x] | mask_rj) == mask_rj)
    printf ("player[x] is running & jumping");

A short example using the mask to find the player who is running and jumping would be:

#include <stdio.h>
#include <limits.h>  // for CHAR_BIT

#if defined(__LP64__) || defined(_LP64)
# define BUILD_64   1
#endif

#ifdef BUILD_64
# define BITS_PER_LONG 64
#else
# define BITS_PER_LONG 32
#endif

#define PLAYER_JUMPING  (1U << 0)
#define PLAYER_SWIMMING (1U << 1)
#define PLAYER_RUNNING  (1U << 2)
#define PLAYER_DEAD     (1U << 3)

char *binpad (unsigned long n, size_t sz);
char *binsep (unsigned long n, size_t sz, unsigned char szs, char sep);

int main (void) {

    unsigned char players[] = { 0b00001000, 0b00000010,
                                0b00000101, 0b00000100 };
    unsigned char nplayers = sizeof players/sizeof *players;
    unsigned char mask_rj = PLAYER_RUNNING | PLAYER_JUMPING;
    unsigned char i = 0;

    printf ("\n  mask_rj    : %hhu  (%s)\n\n", 
            mask_rj, binpad (mask_rj, sizeof mask_rj * CHAR_BIT));

    for (i = 0; i < nplayers; i++)
        printf ("  player [%hhu] : %hhu  (%s)\n", 
                i, players[i], 
                binpad (players[i], sizeof players[i] * CHAR_BIT));

    for (i = 0; i < nplayers; i++)
        if ((players[i] & mask_rj) == mask_rj)
            printf ("\n  player [%hhu] is Running & Jumping\n", i);

    return 0;
}

char *binpad (unsigned long n, size_t sz)
{
    static char s[BITS_PER_LONG + 1] = {0};
    char *p = s + BITS_PER_LONG;
    register size_t i;

    for (i = 0; i < sz; i++)
        *(--p) = (n>>i & 1) ? '1' : '0';

    return p;
}

Output

$ ./bin/bitmask

  mask_rj    : 5  (00000101)

  player [0] : 8  (00001000)
  player [1] : 2  (00000010)
  player [2] : 5  (00000101)
  player [3] : 4  (00000100)

  player [2] is Running & Jumping

来源：https://stackoverflow.com/questions/31173984/what-is-an-intuitive-way-to-interpret-the-bitwise-operators-and-masking-also-w

标签

bit-manipulation

bitwise-operators

bit-shift