Just say I have a value of type uint64_t
seen as sequence of octets (1 octet = 8-bit). The uint64_t
value is known containing only one set bit<
Multiply the value by a carefully designed 64-bit constant, then mask off the upper 4 bits. For any CPU with fast 64-bit multiplication, this is probably as optimal as you can get.
int field_set(uint64_t input) {
uint64_t field = input * 0x20406080a0c0e1ULL;
return (field >> 60) & 15;
}
// field_set(0x0000000000000000ULL) = 0
// field_set(0x0000000000000080ULL) = 1
// field_set(0x0000000000008000ULL) = 2
// field_set(0x0000000000800000ULL) = 3
// field_set(0x0000000080000000ULL) = 4
// field_set(0x0000008000000000ULL) = 5
// field_set(0x0000800000000000ULL) = 6
// field_set(0x0080000000000000ULL) = 7
// field_set(0x8000000000000000ULL) = 8
clang implements this in three x86_64 instructions, not counting the frame setup and cleanup:
_field_set:
push %rbp
mov %rsp,%rbp
movabs $0x20406080a0c0e1,%rax
imul %rdi,%rax
shr $0x3c,%rax
pop %rbp
retq
Note that the results for any other input will be pretty much random. (So don't do that.)
I don't think there's any feasible way to extend this method to return values in the 7..63 range directly (the structure of the constant doesn't permit it), but you can convert the results to that range by multiplying the result by 7.
With regard to how this constant was designed: I started with the following observations:
1ULL<<63
(i.e, your "pos=63" value) can only possibly result in the same value, or zero. (It cannot possibly have any lower bits set, and there are no higher bits to change.) Therefore, we must find some way for this value to be treated as the correct result.Multiplying our constant by each of the other bit fields is equivalent to left-shifting it by a number of bits equal to its "position". The right-shift by 60 bits causes only the 4 bits to the left of a given position to appear in the result. Thus, we can create all of the cases except for one as follows:
uint64_t constant = (
1ULL << (60 - 7)
| 2ULL << (60 - 15)
| 3ULL << (60 - 23)
| 4ULL << (60 - 31)
| 5ULL << (60 - 39)
| 6ULL << (60 - 47)
| 7ULL << (60 - 55)
);
So far, the constant is 0x20406080a0c0e0ULL
. However, this doesn't give the right result for pos=63
; this constant is even, so multiplying it by that input gives zero. We must set the lowest bit (i.e, constant |= 1ULL
) to get that case to work, giving us the final value of 0x20406080a0c0e1ULL
.
Note that the construction above can be modified to encode the results differently. However, the output of 8
is fixed as described above, and all other output must fit into 4 bits (i.e, 0 to 15).
00000000 00000000 00000000 00000000 00000000 00000000 00000000 10000000 pos = 7
..., but returns 0 if there is no bit that is set.
This will return the same if the first bit or no bit is set; however, on x86_64, that is exactly what bsrq does:
int bsrq_x86_64(uint64_t x){
int ret;
asm("bsrq %0, %1":"=r"(ret):"r"(x));
return ret;
}
However; if the first bit is set it will also return 0; here is a method that will run in constant time (no looping or branching) and returns -1 when no bits are set (to distinguish from when the first bit is set).
int find_bit(unsigned long long x){
int ret=0,
cmp = (x>(1LL<<31))<<5; //32 if true else 0
ret += cmp;
x >>= cmp;
cmp = (x>(1<<15))<<4; //16 if true else 0
ret += cmp;
x >>= cmp;
cmp = (x>(1<<7))<<3; //8
ret += cmp;
x >>= cmp;
cmp = (x>(1<<3))<<2; //4
ret += cmp;
x >>= cmp;
cmp = (x>(1<<1))<<1; //2
ret += cmp;
x >>= cmp;
cmp = (x>1);
ret += cmp;
x >>= cmp;
ret += x;
return ret-1;
}
Technically this just returns the position of the most significant set bit. Depending on the type of float used, this can be done in fewer operations using the fast inverse square or other bit twiddling hacks
BTW,If don't mind using compiler builtins, you can just do:
__builtin_popcountll(n-1)
or __builtin_ctzll(n)
or __builtin_ffsll(n)-1
C++ tag was removed, but here is a portable C++ answer nonetheless since you can compile it with C++ and use an extern C
interface:
If you have a power of 2 and you subtract one you end up with a binary number with the number of set bits equal to the position
A way to count the number of set bits (binary 1
s) is wrapped, presumably most efficiently by each implementation of the stl, in std::bitset
member function count
Note that your specification has 0
returned for both 0
or 1
, so I added as_specified_pos
to meet this requirement. Personally I would just leave it return the natural value of 64
when passed 0
to be able to differentiate, and for the speed.
The following code should be extremely portable and most likely optimized per platform by compiler vendors:
#include <bitset>
uint64_t pos(uint64_t val)
{
return std::bitset<64>(val-1).count();
}
uint64_t as_specified_pos(uint64_t val)
{
return (val) ? pos(val) : 0;
}
On Linux with g++ I get the following disassembled code:
0000000000000000 <pos(unsigned long)>:
0: 48 8d 47 ff lea -0x1(%rdi),%rax
4: f3 48 0f b8 c0 popcnt %rax,%rax
9: c3 retq
a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
0000000000000010 <as_specified_pos(unsigned long)>:
10: 31 c0 xor %eax,%eax
12: 48 85 ff test %rdi,%rdi
15: 74 09 je 20 <as_specified_pos(unsigned long)+0x10>
17: 48 8d 47 ff lea -0x1(%rdi),%rax
1b: f3 48 0f b8 c0 popcnt %rax,%rax
20: f3 c3 repz retq
If you want an algorithm for the job rather than a built-in, this will do it. It yields the bit number of the most significant 1 bit, even if more than one bit is set. It narrows down the position by iteratively dividing the bit range under consideration into halves, testing whether there are any bits set in the upper half, taking that half as the new bit range if so, and otherwise taking the lower half as the new bit range.
#define TRY_WINDOW(bits, n, msb) do { \
uint64_t t = n >> bits; \
if (t) { \
msb += bits; \
n = t; \
} \
} while (0)
int msb(uint64_t n) {
int msb = 0;
TRY_WINDOW(32, n, msb);
TRY_WINDOW(16, n, msb);
TRY_WINDOW( 8, n, msb);
TRY_WINDOW( 4, n, msb);
TRY_WINDOW( 2, n, msb);
TRY_WINDOW( 1, n, msb);
return msb;
}
The value mod 0x8C yields a unique value for each of the cases.
This value mod 0x11 is still unique.
The second value in the table is the resulting mod 0x11.
128 9
32768 5
8388608 10
2147483648 0
549755813888 14
140737488355328 2
36028797018963968 4
9223372036854775808 15
So a simple lookup table will suffice.
int find_bit(uint64_t bit){
int lookup[] = { the seventeen values };
return lookup[ (bit % 0x8C) % 0x11];
}
No branching, no compiler tricks.
For completeness, the array is
{ 31, 0, 47, 15, 55, 0, 0, 7, 23, 0, 0, 0, 39, 63, 0, 0}
If you can use POSIX, use the ffs() function from strings.h
(not string.h
!). It returns the position of the least significant bit set (one indexed) or a zero if the argument is zero. On most implementations, a call to ffs()
is inlined and compiled into the corresponding machine instruction, like bsf
on x86. The glibc also has ffsll()
for long long
arguments which should be even more suitable for your problem if available.