I have a byte
I\'m using for bitflags. I know that one and only one bit in the byte
is set at any give time.
Ex:
Lookup table is simple enough, and you can reduce its size if the set of values is sparse. Let's try with 11 elements instead of 128:
unsigned char expt2mod11_bits[11]={0xFF,0,1,0xFF,2,4,0xFF,7,3,6,5};
unsigned char pos = expt2mod11_bits[b%11];
assert(pos < 8);
assert(1<<pos == b);
Of course, it's not necessarily more effective, especially for 8 bits, but the same trick can be used for larger sizes, where full lookup table would be awfully big. Let's see:
unsigned int w;
....
unsigned char expt2mod19_bits[19]={0xFF,0,1,13,2,0xFF,14,6,3,8,0xFF,12,15,5,7,11,4,10,9};
unsigned char pos = expt2mod19_bits[w%19];
assert(pos < 16);
assert(1<<pos == w);
A lookup table is fast and easy when CHAR_BIT == 8, but on some systems, CHAR_BIT == 16 or 32 and a lookup table becomes insanely bulky. If you're considering a lookup table, I'd suggest wrapping it; make it a "lookup table function", instead, so that you can swap the logic when you need to optimise.
Using divide and conquer, by performing a binary search on a sorted array, involves comparisons based on log2 CHAR_BIT
. That code is more complex, involving an initialisation of an array of unsigned char
to use as a lookup table for a start. Once you have such the array initialised, you can use bsearch
to search it, for example:
#include <stdio.h>
#include <stdlib.h>
void uchar_bit_init(unsigned char *table) {
for (size_t x = 0; x < CHAR_BIT; x++) {
table[x] = 1U << x;
}
}
int uchar_compare(void const *x, void const *y) {
char const *X = x, *Y = y;
return (*X > *Y) - (*X < *Y);
}
size_t uchar_bit_lookup(unsigned char *table, unsigned char value) {
unsigned char *position = bsearch(lookup, c, sizeof lookup, 1, char_compare);
return position ? position - table + 1 : 0;
}
int main(void) {
unsigned char lookup[CHAR_BIT];
uchar_bit_init(lookup);
for (;;) {
int c = getchar();
if (c == EOF) { break; }
printf("Bit for %c found at %zu\n", c, uchar_bit_lookup(lookup, c));
}
}
P.S. This sounds like micro-optimisation. Get your solution done (abstracting the operations required into these functions), then worry about optimisations based on your profiling. Make sure your profiling targets the system that your solution will run on if you're going to focus on micro-optimisations, because the efficiency of micro-optimisations differ widely as hardware differs even slightly... It's usually a better idea to buy a faster PC ;)
Based on log2 calculation in Find the log base 2 of an N-bit integer in O(lg(N)) operations:
int getSetBitLocation(unsigned char c) {
// c is in {1, 2, 4, 8, 16, 32, 64, 128}, returned values are {0, 1, ..., 7}
return (((c & 0xAA) != 0) |
(((c & 0xCC) != 0) << 1) |
(((c & 0xF0) != 0) << 2));
}
Easiest thing is to create a lookup table. The simplest one will be sparse (having 256 elements) but it would technically avoid iteration.
This comment here technically avoids iteration, but who are we kidding, it is still doing the same number of checks: How to write log base(2) in c/c++
Closed form would be log2(), a la, log2() + 1
But I'm not sure how efficient that is - possibly the CPU has an instruction for taking base 2 logrithms?
Can I do this without iteration?
It is indeed possible.
How do I most efficiently determine the position of the set bit?
You can try this algorithm. It splits the char in half to search for the top bit, shifting to the low half each time:
int getTopSetBit(unsigned char b) {
int res = 0;
if(b>15){
b = b >> 4;
res = res + 4;
}
if(b>3){
b = b >> 2;
res = res + 2;
}
//thanks @JasonD
return res + (b>>1);
}
It uses two comparisons (three for uint16
s, four for uint32
s...). and it might be faster than your loop. It is definitely not shorter.
Based on the idea by Anton Kovalenko (hashed lookup) and the comment by 6502 (division is slow), I also suggest this implementation (8-bit => 3-bit hash using a de-Bruijn sequence)
int[] lookup = {7, 0, 5, 1, 6, 4, 3, 2};
int getBitPosition(unsigned char b) {
// return lookup[(b | (b>>1) | (b>>2) | (b>>4)) & 0x7];
return lookup[((b * 0x1D) >> 4) & 0x7];
}
or (larger LUT, but uses just three terms instead of four)
int[] lookup = {0xFF, 0, 1, 4, 2, 0xFF, 5, 0xFF, 7, 3, 0xFF, 0xFF, 6, 0xFF, 0xFF, 0xFF};
int getBitPosition(unsigned char b) {
return lookup[(b | (b>>3) | (b>>4)) & 0xF];
}
This is a quite common problem for chess programs that use 64 bits to represent positions (i.e. one 64-bit number to store where are all the white pawns, another for where are all the black ones and so on).
With this representation there is sometimes the need to find the index 0...63 of the first or last set bit and there are several possible approaches:
x & 0x00000000ffffffffULL
is zero there's no need to check low 32 bits)bsf
and bsr
on x86)What is faster however really depends on your hardware and on real use cases. For 8 bits only and a modern processor I think that probably a lookup table with 256 entries is the best choice...
But are you really sure this is the bottleneck of your algorithm?