问题
I'm looking at the entry Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup from Bit Twiddling hacks.
I can easily see how the second algorithm in that entry works
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
which calculates n = log2 v
where v
is known to be a power of 2. In this case 0x077CB531
is an ordinary De Bruijn sequence, and the rest is obvious.
However, the first algorithm in that entry
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
looks a bit more tricky to me. We begin by snapping v
to a nearest greater 2^n - 1
value. This 2^n - 1
value is then multiplied by 0x07C4ACDD
, which in this case acts the same way as the DeBruijn sequence in the previous algorithm did.
My question is: how do we construct this magic 0x07C4ACDD
sequence? I.e. how do we construct a sequence that can be used to generate unique indices when multiplied by a 2^n - 1
value? For 2^n
multiplier it is just an ordinary De Bruijn sequence, as we can see above, so it is clear where 0x077CB531
came from. But what about 2^n - 1
multiplier 0x07C4ACDD
? I feel like I'm missing something obvious here.
P.S. To clarify my question: I'm not really looking for an algorithm to generate these sequences. I'm more interested in some more-or-less trivial property (if one exists) that makes 0x07C4ACDD
work as we want it to work. For 0x077CB531
the property that makes it work is pretty obvious: it contains all 5-bit combinations "stored" in the sequence with 1-bit stepping (which is basically what De Bruijn sequence is).
The 0x07C4ACDD
, on the other hand, is not a De Bruijn sequence by itself. So, what property were they aiming for when constructing 0x07C4ACDD
(besides the non-constructive "it should make the above algorithm work")? Someone did come up with the above algorithm somehow. So they probably knew that the approach is viable, and that the appropriate sequence exists. How did they know that?
For example, if I were to construct the algorithm for an arbitrary v
, I'd do
v |= v >> 1;
v |= v >> 2;
...
first. Then I'd just do ++v
to turn v
into a power of 2 (let's assume it doesn't overflow). Then I'd apply the first algorithm. And finally I'd do --r
to obtain the final answer. However, these people managed to optimize it: they eliminated the leading ++v
and the trailing --r
steps simply by changing the multiplier and rearranging the table. How did they know it was possible? What is the math behind this optimization?
回答1:
A De Bruijn sequence of order n over k symbols (and of k^n length) have a property that every possible n-length word appears as consecutive characters in it, some of them with cyclic wrapping. For example, in the case of k=2, n=2, the possible words are 00, 01, 10, 11, and a De Bruijn sequence is 0011. 00, 01, 11 appears in it, 10 with wrapping. This property naturally means that left shifting a De Bruijn sequence (multiplying with power of two) and taking its upper n bits results in a unique number for each power of two multiplier. Then you only need a lookup table to determine which one it is. It works on a similar principle to numbers which are one less than power of two, but the magic number in this case is not a De Bruijn sequence, but an analogue. The defining property simply changes to "every possible n-length word appears as the sum of the first m subsequences of length n, mod 2^n". This property is all that is needed for the algorithm to work. They simply used this different class of magic numbers to speed up the algorithm. I did as well.
One possible method of construction of De Bruijn numbers is the generation of a Hamiltonian path of the De Bruijn graph, Wikipedia provides an example of such a graph. In this case, the nodes are 2^5=32-bit integers, the directed edges are transitions between them, where a transition is a left shift and a binary or operation according to the label of the edge, 0 or 1. There might be a direct analogue to 2^n-1 type magic numbers, it might be worth exploring, but this isn't a way people usually construct such algorithms.
In practice you might try to construct it differently, especially if you want it to behave in a tad different manner. For example, the implementation of leading/trailing number of zeros algorithms on the bit twiddling hacks page can only return values in [0..31]. It needs additional checking for the case of 0, which has 32 zeros. This check requires a branching and can be way too slow on some CPUs.
The way I did it, I used a 64 element lookup table instead of 32, generated random magic numbers, and for each of them I built up a lookup table with power of two inputs, checked its correctness (injectivity), then verified it for all 32-bit numbers. I continued till I encountered a correct magic number. The resulting numbers do not fulfill a property of "every possible n-length word appears", since only 33 numbers appear, which are unique for all 33 possible input.
Exhaustive brute force search sounds slow, especially if good magic numbers are rare, but if we first test known power of two values as inputs, the table is filled quickly, rejection is fast, and the rejection rate is very high. We only need to clear the table after each magic number. In essence I exploited a high rejection rate algorithm to construct magic numbers.
The resulting algorithms are
int32 Integer::numberOfLeadingZeros (int32 x)
{
static int32 v[64] = {
32, -1, 1, 19, -1, -1, -1, 27, -1, 24, 3, -1, 29, -1, 9, -1,
12, 7, -1, 20, -1, -1, 4, 30, 10, -1, 21, -1, 5, 31, -1, -1,
-1, -1, 0, 18, 17, 16, -1, -1, 15, -1, -1, -1, 26, -1, 14, -1,
23, -1, 2, -1, -1, 28, 25, -1, -1, 13, 8, -1, -1, 11, 22, 6};
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x *= 0x749c0b5d;
return v[cast<uint32>(x) >> 26];
}
int32 Integer::numberOfTrailingZeros (int32 x)
{
static int32 v[64] = {
32, -1, 2, -1, 3, -1, -1, -1, -1, 4, -1, 17, 13, -1, -1, 7,
0, -1, -1, 5, -1, -1, 27, 18, 29, 14, 24, -1, -1, 20, 8, -1,
31, 1, -1, -1, -1, 16, 12, 6, -1, -1, -1, 26, 28, 23, 19, -1,
30, -1, 15, 11, -1, 25, 22, -1, -1, 10, -1, 21, 9, -1, -1, -1};
x &= -x;
x *= 0x4279976b;
return v[cast<uint32>(x) >> 26];
}
As for your question of how did they know, they probably didn't. They experimented, tried to change things, just like me. After all, it isn't a big stretch of imagination that 2^n-1 inputs might work instead of 2^n inputs with different magic number and lookup table.
Here, I made a simplified version of my magic number generator code. It checks all possible magic numbers in 5 minutes if we check only for power of two inputs, finding 1024 magic numbers. Checking against other inputs are pointless since they are reduced to 2^n-1 form anyway. Does not construct the table, but it is trivial once you know the magic number.
#include <Frigo/all>
#include <Frigo/all.cpp>
using namespace Frigo::Lang;
using namespace std;
class MagicNumberGenerator
{
public:
static const int32 log2n = 5;
static const int32 n = 1 << log2n;
static const bool tryZero = false;
MagicNumberGenerator () {}
void tryAllMagic ()
{
for( int32 magic = 0; magic < Integer::MAX_VALUE; magic++ ){
tryMagic(magic);
}
tryMagic(Integer::MAX_VALUE);
for( int32 magic = Integer::MIN_VALUE; magic < 0; magic++ ){
tryMagic(magic);
}
}
bool tryMagic (int32 magic)
{
// clear table
for( int32 i = 0; i < n; i++ ){
table[i] = -1;
}
// try for zero
if( tryZero and not tryInput(magic, 0) ){
return false;
}
// try for all power of two inputs, filling table quickly in the process
for( int32 i = 0; i < 32; i++ ){
if( not tryInput(magic, 1 << i) ){
return false;
}
}
// here we would test all possible 32-bit inputs except zero, but it is pointless due to the reduction to 2^n-1 form
// we found a magic number
cout << "Magic number found: 0x" << Integer::toHexString(magic) << endl;
return true;
}
bool tryInput (int32 magic, int32 x)
{
// calculate good answer
int32 leadingZeros = goodNumberOfLeadingZeros(x);
// calculate scrambled but hopefully injective answer
x |= x >> 1;
x |= x >> 2;
x |= x >> 4;
x |= x >> 8;
x |= x >> 16;
x *= magic;
x = Integer::unsignedRightShift(x, 32 - log2n);
// reject if answer is not injective
if( table[x] != -1 ){
return table[x] == leadingZeros;
}
// store result for further injectivity checks
table[x] = leadingZeros;
return true;
}
static int32 goodNumberOfLeadingZeros (int32 x)
{
int32 r = 32;
if( cast<uint32>(x) & 0xffff0000 ){
x >>= 16;
r -= 16;
}
if( x & 0xff00 ){
x >>= 8;
r -= 8;
}
if( x & 0xf0 ){
x >>= 4;
r -= 4;
}
if( x & 0xc ){
x >>= 2;
r -= 2;
}
if( x & 0x2 ){
x >>= 1;
r--;
}
if( x & 0x1 ){
r--;
}
return r;
}
int32 table[n];
};
int32 main (int32 argc, char* argv[])
{
if(argc||argv){}
measure{
MagicNumberGenerator gen;
gen.tryAllMagic();
}
}
回答2:
It's based on the paper Using de Bruijn Sequences to Index a 1 in a Computer Word. I'm going to guess that they did a search for a perfect hash function to map 2^n-1
to [0..31]
. They describe a method for searching for counting zeroes of integers with up to two bits set which involves incrementally building up the multiplier.
回答3:
From: http://www.stmintz.com/ccc/index.php?id=306404
130329821
0x07C4ACDD
00000111110001001010110011011101B
bit 31 - bit 27 00000 0
bit 30 - bit 26 00001 1
bit 29 - bit 25 00011 3
bit 28 - bit 24 00111 7
bit 27 - bit 23 01111 15
bit 26 - bit 22 11111 31
bit 25 - bit 21 11110 30
bit 24 - bit 20 11100 28
bit 23 - bit 19 11000 24
bit 22 - bit 18 10001 17
bit 21 - bit 17 00010 2
bit 20 - bit 16 00100 4
bit 19 - bit 15 01001 9
bit 18 - bit 14 10010 18
bit 17 - bit 13 00101 5
bit 16 - bit 12 01010 10
bit 15 - bit 11 10101 21
bit 14 - bit 10 01011 11
bit 13 - bit 9 10110 22
bit 12 - bit 8 01100 12
bit 11 - bit 7 11001 25
bit 10 - bit 6 10011 19
bit 9 - bit 5 00110 6
bit 8 - bit 4 01101 13
bit 7 - bit 3 11011 27
bit 6 - bit 2 10111 23
bit 5 - bit 1 01110 14
bit 4 - bit 0 11101 29
bit 3 - bit 31 11010 26
bit 2 - bit 30 10100 20
bit 1 - bit 29 01000 8
bit 0 - bit 28 10000 16
It seems to me that 0x07C4ACDD is a 5-bit de Bruijn sequence.
来源:https://stackoverflow.com/questions/7365562/de-bruijn-like-sequence-for-2n-1-how-is-it-constructed