I am looking for an efficient way to determine the position of the least significant bit that is set in an integer, e.g. for 0x0FF0 it would be 4.
A trivial impleme
Here is one simple alternative, even though finding logs is a bit costly.
if(n == 0)
return 0;
return log2(n & -n)+1; //Assuming the bit index starts from 1
Inspired by this similar post that involves searching for a set bit, I offer the following:
unsigned GetLowestBitPos(unsigned value)
{
double d = value ^ (value - !!value);
return (((int*)&d)[1]>>20)-1023;
}
Pros:
Cons:
Update: As pointed out in the comments, a union is a cleaner implementation (for C, at least) and would look like:
unsigned GetLowestBitPos(unsigned value)
{
union {
int i[2];
double d;
} temp = { .d = value ^ (value - !!value) };
return (temp.i[1] >> 20) - 1023;
}
This assumes 32-bit ints with little-endian storage for everything (think x86 processors).
This is in regards of @Anton Tykhyy answer
Here is my C++11 constexpr implementation doing away with casts and removing a warning on VC++17 by truncating a 64bit result to 32 bits:
constexpr uint32_t DeBruijnSequence[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
constexpr uint32_t ffs ( uint32_t value )
{
return DeBruijnSequence[
(( ( value & ( -static_cast<int32_t>(value) ) ) * 0x077CB531ULL ) & 0xFFFFFFFF)
>> 27];
}
To get around the issue of 0x1 and 0x0 both returning 0 you can do:
constexpr uint32_t ffs ( uint32_t value )
{
return (!value) ? 32 : DeBruijnSequence[
(( ( value & ( -static_cast<int32_t>(value) ) ) * 0x077CB531ULL ) & 0xFFFFFFFF)
>> 27];
}
but if the compiler can't or won't preprocess the call it will add a couple of cycles to the calculation.
Finally, if interested, here's a list of static asserts to check that the code does what is intended to:
static_assert (ffs(0x1) == 0, "Find First Bit Set Failure.");
static_assert (ffs(0x2) == 1, "Find First Bit Set Failure.");
static_assert (ffs(0x4) == 2, "Find First Bit Set Failure.");
static_assert (ffs(0x8) == 3, "Find First Bit Set Failure.");
static_assert (ffs(0x10) == 4, "Find First Bit Set Failure.");
static_assert (ffs(0x20) == 5, "Find First Bit Set Failure.");
static_assert (ffs(0x40) == 6, "Find First Bit Set Failure.");
static_assert (ffs(0x80) == 7, "Find First Bit Set Failure.");
static_assert (ffs(0x100) == 8, "Find First Bit Set Failure.");
static_assert (ffs(0x200) == 9, "Find First Bit Set Failure.");
static_assert (ffs(0x400) == 10, "Find First Bit Set Failure.");
static_assert (ffs(0x800) == 11, "Find First Bit Set Failure.");
static_assert (ffs(0x1000) == 12, "Find First Bit Set Failure.");
static_assert (ffs(0x2000) == 13, "Find First Bit Set Failure.");
static_assert (ffs(0x4000) == 14, "Find First Bit Set Failure.");
static_assert (ffs(0x8000) == 15, "Find First Bit Set Failure.");
static_assert (ffs(0x10000) == 16, "Find First Bit Set Failure.");
static_assert (ffs(0x20000) == 17, "Find First Bit Set Failure.");
static_assert (ffs(0x40000) == 18, "Find First Bit Set Failure.");
static_assert (ffs(0x80000) == 19, "Find First Bit Set Failure.");
static_assert (ffs(0x100000) == 20, "Find First Bit Set Failure.");
static_assert (ffs(0x200000) == 21, "Find First Bit Set Failure.");
static_assert (ffs(0x400000) == 22, "Find First Bit Set Failure.");
static_assert (ffs(0x800000) == 23, "Find First Bit Set Failure.");
static_assert (ffs(0x1000000) == 24, "Find First Bit Set Failure.");
static_assert (ffs(0x2000000) == 25, "Find First Bit Set Failure.");
static_assert (ffs(0x4000000) == 26, "Find First Bit Set Failure.");
static_assert (ffs(0x8000000) == 27, "Find First Bit Set Failure.");
static_assert (ffs(0x10000000) == 28, "Find First Bit Set Failure.");
static_assert (ffs(0x20000000) == 29, "Find First Bit Set Failure.");
static_assert (ffs(0x40000000) == 30, "Find First Bit Set Failure.");
static_assert (ffs(0x80000000) == 31, "Find First Bit Set Failure.");
Why not use binary search? This will always complete after 5 operations (assuming int size of 4 bytes):
if (0x0000FFFF & value) {
if (0x000000FF & value) {
if (0x0000000F & value) {
if (0x00000003 & value) {
if (0x00000001 & value) {
return 1;
} else {
return 2;
}
} else {
if (0x0000004 & value) {
return 3;
} else {
return 4;
}
}
} else { ...
} else { ...
} else { ...
According to the Chess Programming BitScan page and my own measurements, subtract and xor is faster than negate and mask.
(Note than if you are going to count the trailing zeros in 0
, the method as I have it returns 63
whereas the negate and mask returns 0
.)
Here is a 64-bit subtract and xor:
unsigned long v; // find the number of trailing zeros in 64-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[64] =
{
0, 47, 1, 56, 48, 27, 2, 60, 57, 49, 41, 37, 28, 16, 3, 61,
54, 58, 35, 52, 50, 42, 21, 44, 38, 32, 29, 23, 17, 11, 4, 62,
46, 55, 26, 59, 40, 36, 15, 53, 34, 51, 20, 43, 31, 22, 10, 45,
25, 39, 14, 33, 19, 30, 9, 24, 13, 18, 8, 12, 7, 6, 5, 63
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v ^ (v-1)) * 0x03F79D71B4CB0A89U)) >> 58];
For reference, here is a 64-bit version of the negate and mask method:
unsigned long v; // find the number of trailing zeros in 64-bit v
int r; // result goes here
static const int MultiplyDeBruijnBitPosition[64] =
{
0, 1, 48, 2, 57, 49, 28, 3, 61, 58, 50, 42, 38, 29, 17, 4,
62, 55, 59, 36, 53, 51, 43, 22, 45, 39, 33, 30, 24, 18, 12, 5,
63, 47, 56, 27, 60, 41, 37, 16, 54, 35, 52, 21, 44, 32, 23, 11,
46, 26, 40, 15, 34, 20, 31, 10, 25, 14, 19, 9, 13, 8, 7, 6
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x03F79D71B4CB0A89U)) >> 58];
If you have the resources, you can sacrifice memory in order to improve the speed:
static const unsigned bitPositions[MAX_INT] = { 0, 0, 1, 0, 2, /* ... */ };
unsigned GetLowestBitPos(unsigned value)
{
assert(value != 0); // handled separately
return bitPositions[value];
}
Note: This table would consume at least 4 GB (16 GB if we leave the return type as unsigned
). This is an example of trading one limited resource (RAM) for another (execution speed).
If your function needs to remain portable and run as fast as possible at any cost, this would be the way to go. In most real-world applications, a 4GB table is unrealistic.