OK, it may sound a bit complicated, but this is what I\'m trying to do :
{ 0, 2, 4, 6, 8, 10 }
Bit shifts are really cheap. Lookup tables cost cache space, and your lookup also have integer multiplication. Just brute-force will be faster than clever techniques, I expect.
vector DQBitboard::bits(U64 bitboard)
vector res;
uint_fast8_t pos = 1;
do {
if (bitboard & 1) res.push_back(pos);
} while (bitboard >>= 1);
return res;
You can unroll the loop a little bit, that may make it faster yet.
The std::vector
is the most expensive part by far. Consider using the bitboard directly. For example:
struct bitboard_end_iterator{};
struct bitboard_iterator
U64 value;
uint_fast8_t pos;
bitboard_iterator(U64 bitboard) : value(bitboard), pos(0)
UINT operator*() const { return pos + 1; }
bool operator==( bitboard_end_iterator ) const { return pos == 64; }
operator bool() const { return pos < 64; }
bitboard_iterator& operator++()
while (U64 prev = value) {
value >>= 1;
if (prev & 1) return *this;
pos = 64;
return *this;
Now you can write
for( bitboard_iterator it(bitboard); it; ++it )
cout << *it;
and I think you'll get your list of bits.
Version 2:
class bitboard_fast_iterator
U64 value;
uint_fast8_t pos;
bitboard_fast_iterator(U64 bitboard = 0) : value(bitboard), pos(__builtin_ctzll(value)) {}
UINT operator*() const { return pos + 1; }
operator bool() const { return value != 0; }
bitboard_iterator& operator++()
value &= ~(1ULL << pos);
pos = __builtin_ctzll(value);
return *this;