Find nth SET bit in an int

后端 未结 11 865
栀梦
栀梦 2020-12-14 18:14

Instead of just the lowest set bit, I want to find the position of the nth lowest set bit. (I\'m NOT talking about value on the nt

相关标签:
11条回答
  • 2020-12-14 18:36

    Nowadays this is very easy with PDEP from the BMI2 instruction set. Here is a 64-bit version with some examples:

    #include <cassert>
    #include <cstdint>
    #include <x86intrin.h>
    
    inline uint64_t nthset(uint64_t x, unsigned n) {
        return _pdep_u64(1ULL << n, x);
    }
    
    int main() {
        assert(nthset(0b0000'1101'1000'0100'1100'1000'1010'0000, 0) ==
                      0b0000'0000'0000'0000'0000'0000'0010'0000);
        assert(nthset(0b0000'1101'1000'0100'1100'1000'1010'0000, 1) ==
                      0b0000'0000'0000'0000'0000'0000'1000'0000);
        assert(nthset(0b0000'1101'1000'0100'1100'1000'1010'0000, 3) ==
                      0b0000'0000'0000'0000'0100'0000'0000'0000);
        assert(nthset(0b0000'1101'1000'0100'1100'1000'1010'0000, 9) ==
                      0b0000'1000'0000'0000'0000'0000'0000'0000);
        assert(nthset(0b0000'1101'1000'0100'1100'1000'1010'0000, 10) ==
                      0b0000'0000'0000'0000'0000'0000'0000'0000);
    }
    
    0 讨论(0)
  • 2020-12-14 18:37

    It turns out that it is indeed possible to do this with no loops. It is fastest to precompute the (at least) 8 bit version of this problem. Of course, these tables use up cache space, but there should still be a net speedup in virtually all modern pc scenarios. In this code, n=0 returns the least set bit, n=1 is second-to-least, etc.

    Solution with __popcnt

    There is a solution using the __popcnt intrinsic (you need __popcnt to be extremely fast or any perf gains over a simple loop solution will be moot. Fortunately most SSE4+ era processors support it).

    // lookup table for sub-problem: 8-bit v
    byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v < 256 and n < 8
    
    ulong nthSetBit(ulong v, ulong n) {
        ulong p = __popcnt(v & 0xFFFF);
        ulong shift = 0;
        if (p <= n) {
            v >>= 16;
            shift += 16;
            n -= p;
        }
        p = __popcnt(v & 0xFF);
        if (p <= n) {
            shift += 8;
            v >>= 8;
            n -= p;
        }
    
        if (n >= 8) return 0; // optional safety, in case n > # of set bits
        return PRECOMP[v & 0xFF][n] << shift;
    }
    

    This illustrates how the divide and conquer approach works.

    General Solution

    There is also a solution for "general" architectures- without __popcnt. It can be done by processing in 8-bit chunks. You need one more lookup table that tells you the popcnt of a byte:

    byte PRECOMP[256][8] = { .... } // PRECOMP[v][n] for v<256 and n < 8
    byte POPCNT[256] = { ... } // POPCNT[v] is the number of set bits in v. (v < 256)
    
    ulong nthSetBit(ulong v, ulong n) {
        ulong p = POPCNT[v & 0xFF];
        ulong shift = 0;
        if (p <= n) {
            n -= p;
            v >>= 8;
            shift += 8;
            p = POPCNT[v & 0xFF];
            if (p <= n) {
                n -= p;
                shift += 8;
                v >>= 8;
                p = POPCNT[v & 0xFF];
                if (p <= n) {
                    n -= p;
                    shift += 8;
                    v >>= 8;
                }
            }
        }
    
        if (n >= 8) return 0; // optional safety, in case n > # of set bits
        return PRECOMP[v & 0xFF][n] << shift;
    }
    

    This could, of course, be done with a loop, but the unrolled form is faster and the unusual form of the loop would make it unlikely that the compiler could automatically unroll it for you.

    0 讨论(0)
  • 2020-12-14 18:39

    I cant see a method without a loop, what springs to mind would be;

    int set = 0;
    int pos = 0;
    while(set < n) {
       if((bits & 0x01) == 1) set++;
       bits = bits >> 1;
       pos++;
    }
    

    after which, pos would hold the position of the nth lowest-value set bit.

    The only other thing that I can think of would be a divide and conquer approach, which might yield O(log(n)) rather than O(n)...but probably not.

    Edit: you said any behaviour, so non-termination is ok, right? :P

    0 讨论(0)
  • 2020-12-14 18:40

    v-1 has a zero where v has its least significant "one" bit, while all more significant bits are the same. This leads to the following function:

    int ffsn(unsigned int v, int n) {
       for (int i=0; i<n-1; i++) {
          v &= v-1; // remove the least significant bit
       }
       return v & ~(v-1); // extract the least significant bit
    }
    
    0 讨论(0)
  • 2020-12-14 18:40

    Building on the answer given by Jukka Suomela, which uses a machine-specific instruction that may not necessarily be available, it is also possible to write a function that does exactly the same thing as _pdep_u64 without any machine dependencies. It must loop over the set bits in one of the arguments, but can still be described as a constexpr function for C++11.

    constexpr inline uint64_t deposit_bits(uint64_t x, uint64_t mask, uint64_t b, uint64_t res) {
        return mask != 0 ? deposit_bits(x, mask & (mask - 1), b << 1, ((x & b) ? (res | (mask & (-mask))) : res)) : res;
    }
    
    constexpr inline uint64_t nthset(uint64_t x, unsigned n)  {
        return deposit_bits(1ULL << n, x, 1, 0);
    }
    
    0 讨论(0)
  • 2020-12-14 18:41

    My answer is mostly based on this implementation of a 64bit word select method (Hint: Look only at the MARISA_USE_POPCNT, MARISA_X64, MARISA_USE_SSE3 codepaths):

    It works in two steps, first selecting the byte containing the n-th set bit and then using a lookup table inside the byte:

    • Extract the lower and higher nibbles for every byte (bitmasks 0xF, 0xF0, shift the higher nibbles down)
    • Replace the nibble values by their popcount (_mm_shuffle_epi8 with A000120)
    • Sum the popcounts of the lower and upper nibbles (Normal SSE addition) to get byte popcounts
    • Compute the prefix sum over all byte popcounts (multiplication with 0x01010101...)
    • Propagate the position n to all bytes (SSE broadcast or again multiplication with 0x01010101...)
    • Do a bytewise comparison (_mm_cmpgt_epi8 leaves 0xFF in every byte smaller than n)
    • Compute the byte offset by doing a popcount on the result

    Now we know which byte contains the bit and a simple byte lookup table like in grek40's answer suffices to get the result.

    Note however that I have not really benchmarked this result against other implementations, only that I have seen it to be quite efficient (and branchless)

    0 讨论(0)
提交回复
热议问题