mirror bits of a 32 bit word

守給你的承諾、 提交于 2020-05-23 06:07:06

问题


How would you do that in C? (Example: 10110001 becomes 10001101 if we had to mirror 8 bits). Are there any instructions on certain processors that would simplify this task?


回答1:


It's actually called "bit reversal", and is commonly done in FFT scrambling. The O(log N) way is (for up to 32 bits):

uint32_t reverse(uint32_t x, int bits)
{
    x = ((x & 0x55555555) << 1) | ((x & 0xAAAAAAAA) >> 1); // Swap _<>_
    x = ((x & 0x33333333) << 2) | ((x & 0xCCCCCCCC) >> 2); // Swap __<>__
    x = ((x & 0x0F0F0F0F) << 4) | ((x & 0xF0F0F0F0) >> 4); // Swap ____<>____
    x = ((x & 0x00FF00FF) << 8) | ((x & 0xFF00FF00) >> 8); // Swap ...
    x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16); // Swap ...
    return x >> (32 - bits);
}

Maybe this small "visualization" helps:
An example of the first 3 assignment, with a uint8_t example:

b7 b6 b5 b4  b3 b2 b1 b0
-> <- -> <-  -> <- -> <-
----> <----  ----> <----
---------->  <----------

Well, if we're doing ASCII art, here's mine:

7 6 5 4 3 2 1 0
 X   X   X   X 
6 7 4 5 2 3 0 1
 \ X /   \ X /
  X X     X X
 / X \   / X \
4 5 6 7 0 1 2 3
 \ \ \ X / / /
  \ \ X X / /
   \ X X X /
    X X X X
   / X X X \
  / / X X \ \
 / / / X \ \ \
0 1 2 3 4 5 6 7

It kind of looks like FFT butterflies. Which is why it pops up with FFTs.




回答2:


Per Rich Schroeppel in this MIT memo (if you can read past the assembler), the following will reverse the bits in an 8bit byte providing that you have 64bit arithmetic available:

byte = (byte * 0x0202020202ULL & 0x010884422010ULL) % 1023;

Which sort of fans the bits out (the multiply), selects them (the and) and then shrinks them back down (the modulus).

Is it actually an 8bit quantity that you have?




回答3:


Nearly a duplicate of Most Efficient Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C (which has a lot of answers, including one AVX2 answer for reversing every 8-bit char in an array).


X86

On x86 with SSSE3 (Core2 and later, Bulldozer and later), pshufb (_mm_shuffle_epi8) can be used as a nibble LUT to do 16 lookups in parallel. You only need 8 lookups for the 8 nibbles in a single 32-bit integer, but the real problem is splitting the input bytes into separate nibbles (with their upper half zeroed). It's basically the same problem as for pshufb-based popcount.

avx2 register bits reverse shows how to do this for a packed vector of 32-bit elements. The same code ported to 128-bit vectors would compile just fine with AVX.

It's still good for a single 32-bit int because x86 has very efficient round-trip between integer and vector regs: int bitrev = _mm_cvtsi128_si32 ( rbit32( _mm_cvtsi32_si128(input) ) );. That only costs 2 extra movd instructions to get an integer from an integer register into XMM and back. (Round trip latency = 3 cycles on an Intel CPU like Haswell.)


ARM:

rbit has single-cycle latency, and does a whole 32-bit integer in one instruction.




回答4:


Fastest approach is almost sure to be a lookup table:

out[0]=lut[in[3]];
out[1]=lut[in[2]];
out[2]=lut[in[1]];
out[3]=lut[in[0]];

Or if you can afford 128k of table data (by afford, I mean cpu cache utilization, not main memory or virtual memory utilization), use 16-bit units:

out[0]=lut[in[1]];
out[1]=lut[in[0]];



回答5:


The naive / slow / simple way is to extract the low bit of the input and shift it into another variable that accumulates a return value.

#include <stdint.h>

uint32_t mirror_u32(uint32_t input) {
    uint32_t returnval = 0;
    for (int i = 0; i < 32; ++i) {
        int bit = input & 0x01;
        returnval <<= 1;
        returnval += bit;    // Shift the isolated bit into returnval
        input >>= 1;
    }
    return returnval;
}

For other types, the number of bits of storage is sizeof(input) * CHAR_BIT, but that includes potential padding bits that aren't part of the value. The fixed-width types are a good idea here.

The += instead of |= makes gcc compile it more efficiently for x86 (using x86's shift-and-add instruction, LEA). Of course, there are much faster ways to bit-reverse; see the other answers. This loop is good for small code size (no large masks), but otherwise pretty much no advantage.

Compilers unfortunately don't recognize this loop as a bit-reverse and optimize it to ARM rbit or whatever. (See it on the Godbolt compiler explorer)




回答6:


I've also just figured out a minimal solution for mirroring 4 bits (a nibble) in only 16 bits temporary space.

mirr = ( (orig * 0x222) & 0x1284 ) % 63



回答7:


If you are interested in a more embedded approach, when I worked with an armv7a system, I found the RBIT command.

So within a C function using GNU extended asm I could use:

uint32_t bit_reverse32(uint32_t inp32)
{
    uint32_t out = 0;
    asm("RBIT %0, %1" : "=r" (out) : "r" (inp32));
    return out;
}

There are compilers which expose intrinsic C wrappers like this. (armcc __rbit) and gcc also has some intrinsic via ACLE but with gcc-arm-linux-gnueabihf I could not find __rbit C so I came up with the upper code.

I didn't look, but I suppose on other platforms you could create similar solutions.




回答8:


I think I would make a lookup table of bitpatterns 0-255. Read each byte and with the lookup table reverse that byte and afterwards arrange the resulting bytes appropriately.




回答9:


quint64 mirror(quint64 a,quint8 l=64) {
    quint64 b=0;
    for(quint8 i=0;i&lt;l;i++) {
        b|=(a>>(l-i-1))&((quint64)1<<i);
    }
return b;
}

This function mirroring less then 64 bits. For instance it can mirroring 12 bits.

quint64 and quint8 are defined in Qt. But it possible redefine it in anyway.




回答10:


If you have been staring at Mike DeSimone's great answer (like me), here is a "visualization" on the first 3 assignment, with a uint8_t example:

b7 b6 b5 b4  b3 b2 b1 b0
-> <- -> <-  <- -> <- ->
----> <----  ----> <----
---------->  <----------

So first, bitwise swap, then "two-bit-group" swap and so on.




回答11:


int mirror (int input)
{// return bit mirror of 8 digit number 
  int tmp2;
  int out=0;
  for (int i=0; i<8; i++)
    {
      out = out << 1;
      tmp2 = input & 0x01;
      out = out | tmp2;
      input = input >> 1;        
    }
   return out;
}


来源:https://stackoverflow.com/questions/4245936/mirror-bits-of-a-32-bit-word

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!