How do the bit manipulations in this bit-sorting code work?

前端 未结 6 1486
时光取名叫无心
时光取名叫无心 2021-01-31 22:20

Jon Bentley in Column 1 of his book programming pearls introduces a technique for sorting a sequence of non-zero positive integers using bit vectors.

I have taken the p

相关标签:
6条回答
  • 2021-01-31 23:05

    Basically is a bucket sort optimized:

    • reserve a bit array of length n bits.
    • clear the bit array (first for in main).
    • read the items one by one (they must all be distinct).
      • set the i'th bit in the bit array if the read number is i.
    • iterate the bit array.
      • if the bit is set then print the position.

    Or in other words (for N < 10 and to sort 3 numbers 4, 6, 2) 0

    start with an empty 10 bit array (aka one integer usually)

    0000000000
    

    read 4 and set the bit in the array..

    0000100000
    

    read 6 and set the bit in the array

    0000101000
    

    read 2 and set the bit in the array

    0010101000
    

    iterate the array and print every position in which the bits are set to one.

    2, 4, 6

    sorted.

    0 讨论(0)
  • 2021-01-31 23:07

    The first 3 constants are inter-related. BITSPERWORD is 32. This you'd want to set based on your compiler+architecture. SHIFT is 5, because 2^5 = 32. Finally, MASK is 0x1F which is 11111 in binary (ie: the bottom 5 bits are all set). Equivalently, MASK = BITSPERWORD - 1.

    The bitset is conceptually just an array of bits. This implementation actually uses an array of ints, and assumes 32 bits per int. So whenever we want to set, clear or test (read) a bit we need to figure out two things:

    • which int (of the array) is it in
    • which of that int's bits are we talking about

    Because we're assuming 32 bits per int, we can just divide by 32 (and truncate) to get the array index we want. Dividing by 32 (BITSPERWORD) is the same as shifting to the right by 5 (SHIFT). So that's what the a[i>>SHIFT] bit is about. You could also write this as a[i/BITSPERWORD] (and in fact, you'd probably get the same or very similar code assuming your compiler has a reasonable optimizer).

    Now that we know which element of a we want, we need to figure out which bit. Really, we want the remainder. We could do this with i%BITSPERWORD, but it turns out that i&MASK is equivalent. This is because BITSPERWORD is a power of 2 (2^5 in this case) and MASK is the bottom 5 bits all set.

    0 讨论(0)
  • 2021-01-31 23:09

    The bit magic is used as a special addressing scheme that works well with row sizes that are powers of two.

    If you try understand this (note: I rather use bits-per-row than bits-per-word, since we're talking about a bit-matrix here):

    // supposing an int of 1 bit would exist...
    int1 bits[BITSPERROW * N]; // an array of N x BITSPERROW elements
    
    // set bit at x,y:
    int linear_address = y*BITSPERWORD + x;
    bits + linear_address = 1; // or 0
    // 0 1 2 3 4 5 6 7 8 9 10 11 ... 31
    // . . . . . . . . . .  .  .       .  
    // . . . . X . . . . .  .  .       .  -> x = 4, y = 1 => i = (1*32 + 4)
    

    The statement linear_address = y*BITSPERWORD + x also means that x = linear_address % BITSPERWORD and y = linear_address / BITSPERWORD.

    When you optimize this in memory by using 1 word of 32 bits per row, you get the fact that a bit at column x can be set using

    int bitrow = 0;
    bitrow |= 1 << (x);
    

    Now when we iterate over the bits, we have the linear address, but need to find the corresponding word.

    int column = linear_address % BITSPERROW;
    int bit_mask =  1 << column; // meaning for the xth column, 
                                 // you take 1 and shift that bit x times
    int row    = linear_address / BITSPERROW;
    

    So to set the i'th bit, you can do this:

    bits[ i%BITSPERROW ] |= 1 << (linear_address / BITSPERROW );
    

    An extra gotcha is, that the modulo operator can be replaced by a logical AND, and the / operator can be replaced by a shift, too, if the second operand is a power of two.

    a % BITSPERROW == a & ( BITSPERROW - 1 ) == a & MASK
    a / BITSPERROW == a >> ( log2(BITSPERROW) ) == a & SHIFT
    

    This ultimately boils down to the very dense, yet hard-to-understand-for-the-bitfucker-agnostic notation

    a[ i >> SHIFT ] |= ( 1 << (i&MASK) );
    

    But I don't see the algorithm working for e.g. 40 bits per word.

    0 讨论(0)
  • 2021-01-31 23:13

    Starting with set():
    A right shift of 5 is the same as dividing by 32. It does that to find which int the bit is in.
    MASK is 0x1f or 31. ANDing with the address gives the bit index within the int. It's the same as the remainder of dividing the address by 32.
    Shifting 1 left by the bit index ("1<<(i & MASK)") results in an integer which has just 1 bit in the given position set.
    ORing sets the bit.
    The line "int sh = i>>SHIFT;" is a wasted line, because they didn't use sh again beneath it, and instead just repeated "i>>SHIFT"

    clr() is basically the same as set, except instead of ORing with 1<<(i & MASK) to set the bit, it ANDs with the inverse to clear the bit. test() ANDs with 1<<(i & MASK) to test the bit.

    The bitsort will also remove duplicates from the list, because it will only count up to 1 per integer. A sort that uses integers instead of bits to count more than 1 of each is called a radix sort.

    0 讨论(0)
  • 2021-01-31 23:22

    A few doubts : 1. Why is it a need for a 32 bit ? 2. Can we do this in Java by creating a HashMap with Keys from 0000000 to 9999999 and values 0 or 1 based on the presence/absence of the bit ? What are the implications for such a program ?

    0 讨论(0)
  • 2021-01-31 23:27

    Quoting the excerpts from Bentleys' original article in DDJ, this is what the code does at a high level:

    /* phase 1: initialize set to empty */
    
    for (i = 0; i < n; i++)
    
        bit[i] = 0
    
    /* phase 2: insert present elements */
    
    for each i in the input file
    
        bit[i] = 1
    
    /* phase 3: write sorted output */
    
    for (i = 0; i < n; i++)
    
        if bit[i] == 1
    
            write i on the output file
    
    0 讨论(0)
提交回复
热议问题