What is a fast way to compute the (long int) ceiling(log_2(i))
, where the input and output are 64-bit integers? Solutions for signed or unsigned integers are ac
The fastest approach I'm aware of uses a fast log2
that rounds down, combined unconditional adjustment of input value before and after to handle the rounding up case as in lg_down()
shown below.
/* base-2 logarithm, rounding down */
static inline uint64_t lg_down(uint64_t x) {
return 63U - __builtin_clzl(x);
}
/* base-2 logarithm, rounding up */
static inline uint64_t lg_up(uint64_t x) {
return lg_down(x - 1) + 1;
}
Basically adding 1 to the rounded-down result is already correct for all values except exact powers of two (since in that case the floor
and ceil
approaches should return the same answer), so it is sufficient to subtract 1 from the input value to handle that case (it doesn't change the answer for the other cases) and add one to the result.
This is usually slightly faster than the approaches that adjust the value by explicitly checking for exact powers of two (e.g., adding a !!(x & (x - 1))
term). It avoids any comparisons and conditional operations or branches, is more likely to simply when inlining, is more amenable to vectorization, etc.
This relies on the "count leading bits" functionality offered by most CPUs using the clang/icc/gcc builtin __builtin_clzl
, but other platforms offer something similar (e.g., the BitScanReverse
intrinsic in Visual Studio).
Unfortunately, this many return the wrong answer for log(1)
, since that leads to __builtin_clzl(0)
which is undefined behavior based on the gcc documentation. Of course, the general "count leading zeros" function has perfectly defined behavior at zero, but the gcc builtin is defined in this way because prior to the BMI ISA extension on x86, it would have been using the bsr instruction which itself has undefined behavior.
You could work around this if you know you have the lzcnt
instruction by using the lzcnt
intrinsic directly. Most platforms other than x86 never went through the bsr
mistake in the first place, and probably also offer methods to access their "count leading zeros" instruction if they have one.
If you have 80-bit or 128-bit floats available, cast to that type and then read off the exponent bits. This link has details (for integers up to 52 bits) and several other methods:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogIEEE64Float
Also, check the ffmpeg source. I know they have a very fast algorithm. Even if it's not directly extensible to larger sizes, you can easily do something like if (x>INT32_MAX) return fastlog2(x>>32)+32; else return fastlog2(x);
I will give you the fastest way for x86-64 at the time of writing, and a general technique if you have a fast floor that works for arguments < 2^63, if you care for the full range, then see below.
I am surprised about the low quality of the other answers because they tell you how to get the floor but transform the floor in a very expensive way (using conditionals and everything!) to the ceiling.
If you can get the floor of the logarithm quickly, for example, using __builtin_clzll
, then the floor is very easily obtained like this:
unsigned long long log2Floor(unsigned long long x) {
return 63 - __builtin_clzll(x);
}
unsigned long long log2Ceiling(unsigned long long x) {
return log2Floor(2*x - 1);
}
It works because it adds 1 to the result unless x is exactly a power of 2.
See the x86-64 assembler difference at the compiler explorer for another implementation of ceiling like this:
auto log2CeilingDumb(unsigned long x) {
return log2Floor(x) + (!!(x & (x - 1)));
}
Gives:
log2Floor(unsigned long): # @log2Floor(unsigned long)
bsr rax, rdi
ret
log2CeilingDumb(unsigned long): # @log2CeilingDumb(unsigned long)
bsr rax, rdi
lea rcx, [rdi - 1]
and rcx, rdi
cmp rcx, 1
sbb eax, -1
ret
log2Ceiling(unsigned long): # @log2Ceiling(unsigned long)
lea rax, [rdi + rdi]
add rax, -1
bsr rax, rax
ret
For the full range, it is in a previous answer: return log2Floor(x - 1) + 1
, this is significantly slower since it uses in x86-64 four instructions compared to three above.
#include "stdafx.h"
#include "assert.h"
int getpos(unsigned __int64 value)
{
if (!value)
{
return -1; // no bits set
}
int pos = 0;
if (value & (value - 1ULL))
{
pos = 1;
}
if (value & 0xFFFFFFFF00000000ULL)
{
pos += 32;
value = value >> 32;
}
if (value & 0x00000000FFFF0000ULL)
{
pos += 16;
value = value >> 16;
}
if (value & 0x000000000000FF00ULL)
{
pos += 8;
value = value >> 8;
}
if (value & 0x00000000000000F0ULL)
{
pos += 4;
value = value >> 4;
}
if (value & 0x000000000000000CULL)
{
pos += 2;
value = value >> 2;
}
if (value & 0x0000000000000002ULL)
{
pos += 1;
value = value >> 1;
}
return pos;
}
int _tmain(int argc, _TCHAR* argv[])
{
assert(getpos(0ULL) == -1); // None bits set, return -1.
assert(getpos(1ULL) == 0);
assert(getpos(2ULL) == 1);
assert(getpos(3ULL) == 2);
assert(getpos(4ULL) == 2);
for (int k = 0; k < 64; ++k)
{
int pos = getpos(1ULL << k);
assert(pos == k);
}
for (int k = 0; k < 64; ++k)
{
int pos = getpos( (1ULL << k) - 1);
assert(pos == (k < 2 ? k - 1 : k) );
}
for (int k = 0; k < 64; ++k)
{
int pos = getpos( (1ULL << k) | 1);
assert(pos == (k < 1 ? k : k + 1) );
}
for (int k = 0; k < 64; ++k)
{
int pos = getpos( (1ULL << k) + 1);
assert(pos == k + 1);
}
return 0;
}
If you're compiling for 64-bit processors on Windows, I think this should work. _BitScanReverse64 is an intrinsic function.
#include <intrin.h>
__int64 log2ceil( __int64 x )
{
unsigned long index;
if ( !_BitScanReverse64( &index, x ) )
return -1LL; //dummy return value for x==0
// add 1 if x is NOT a power of 2 (to do the ceil)
return index + (x&(x-1)?1:0);
}
For 32-bit, you can emulate _BitScanReverse64, with 1 or 2 calls to _BitScanReverse. Check the upper 32-bits of x, ((long*)&x)[1], then the lower 32-bits if needed, ((long*)&x)[0].
If you can limit yourself to gcc, there are a set of builtin functions which return the number of leading zero bits and can be used to do what you want with a little work:
int __builtin_clz (unsigned int x)
int __builtin_clzl (unsigned long)
int __builtin_clzll (unsigned long long)