Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C

后端 未结 26 1353
情深已故
情深已故 2020-11-22 06:08

What is the most efficient algorithm to achieve the following:

0010 0000 => 0000 0100

The conversion is from MSB->LSB to LSB->MSB. All bits

26条回答
  •  情歌与酒
    2020-11-22 06:50

    This ain't no job for a human! ... but perfect for a machine

    This is 2015, 6 years from when this question was first asked. Compilers have since become our masters, and our job as humans is only to help them. So what's the best way to give our intentions to the machine?

    Bit-reversal is so common that you have to wonder why the x86's ever growing ISA doesn't include an instruction to do it one go.

    The reason: if you give your true concise intent to the compiler, bit reversal should only take ~20 CPU cycles. Let me show you how to craft reverse() and use it:

    #include 
    #include 
    
    uint64_t reverse(const uint64_t n,
                     const uint64_t k)
    {
            uint64_t r, i;
            for (r = 0, i = 0; i < k; ++i)
                    r |= ((n >> i) & 1) << (k - i - 1);
            return r;
    }
    
    int main()
    {
            const uint64_t size = 64;
            uint64_t sum = 0;
            uint64_t a;
            for (a = 0; a < (uint64_t)1 << 30; ++a)
                    sum += reverse(a, size);
            printf("%" PRIu64 "\n", sum);
            return 0;
    }
    

    Compiling this sample program with Clang version >= 3.6, -O3, -march=native (tested with Haswell), gives artwork-quality code using the new AVX2 instructions, with a runtime of 11 seconds processing ~1 billion reverse()s. That's ~10 ns per reverse(), with .5 ns CPU cycle assuming 2 GHz puts us at the sweet 20 CPU cycles.

    • You can fit 10 reverse()s in the time it takes to access RAM once for a single large array!
    • You can fit 1 reverse() in the time it takes to access an L2 cache LUT twice.

    Caveat: this sample code should hold as a decent benchmark for a few years, but it will eventually start to show its age once compilers are smart enough to optimize main() to just printf the final result instead of really computing anything. But for now it works in showcasing reverse().

提交回复
热议问题