How to convert a sequence of 32 char (0/1) to 32 bits (uint32_t)?

问题

I have an array of char (usually thousands of bytes long) read from a file, all composed of 0 and 1 (not '0' and '1', in which case I could use strtoul). I want to pack these into single bits, thus converting each 32 char into a single uint32_t. Should I write a bit shift operation with 32 parts, or is there a saner way?

out[i/32] = 
    data[i] << 31 |
    data[i+1] << 30 |
    data[i+2] << 29 |
    data[i+3] << 28 |
    data[i+4] << 27 |
    data[i+5] << 26 |
    data[i+6] << 25 |
    data[i+7] << 24 |
    data[i+8] << 23 |
    data[i+9] << 22 |
    data[i+10] << 21 |
    data[i+11] << 20 |
    data[i+12] << 19 |
    data[i+13] << 18 |
    data[i+14] << 17 |
    data[i+15] << 16 |
    data[i+16] << 15 |
    data[i+17] << 14 |
    data[i+18] << 13 |
    data[i+19] << 12 |
    data[i+20] << 11 |
    data[i+21] << 10 |
    data[i+22] << 9 |
    data[i+23] << 8 |
    data[i+24] << 7 |
    data[i+25] << 6 |
    data[i+26] << 5 |
    data[i+27] << 4 |
    data[i+28] << 3 |
    data[i+29] << 2 |
    data[i+30] << 1 |
    data[i+31];

If this monstrous bit shift is the fastest in run time, then I'll have to stick to it.

回答1:

Restricted to the x86 platform, you can use the PEXT instruction. It is part of the BMI2 instruction set extension on newer processors.

Use 32-bit instructions in a row and then merge the results in one value with shifts.

This is probably the optimal approach on Intel processors, but the disadvantage is that this instruction is slow on AMD Ryzen.

回答2:

If you don't need the output bits to appear in exactly the same order as the input bytes, but if they can instead be "interleaved" in a specific way, then a fast and portable way to accomplish this is to take 8 blocks of 8 bytes (64 bytes total) and to combine all the LSBs together into a single 8 byte value.

Something like:

uint32_t extract_lsbs2(uint8_t (&input)[32]) {
  uint32_t t0, t1, t2, t3, t4, t5, t6, t7;
  memcpy(&t0, input + 0 * 4, 4);
  memcpy(&t1, input + 1 * 4, 4);
  memcpy(&t2, input + 2 * 4, 4);
  memcpy(&t3, input + 3 * 4, 4);
  memcpy(&t4, input + 4 * 4, 4);
  memcpy(&t5, input + 5 * 4, 4);
  memcpy(&t6, input + 6 * 4, 4);
  memcpy(&t7, input + 7 * 4, 4);

  return 
    (t0 << 0) |
    (t1 << 1) |
    (t2 << 2) |
    (t3 << 3) |
    (t4 << 4) |
    (t5 << 5) |
    (t6 << 6) |
    (t7 << 7);
}

This generates "not terrible, not great" code on most compilers.

If you use uint64_t instead of uint32_t it would generally be twice as fast (assuming you have more than 32 total bytes to transform) on a 64-bit platform.

With SIMD you could easy vectorize the entire operation in something like two instructions (for AVX2, but any x86 SIMD ISA will work): compare and pmovmskb.

回答3:

Bit shifting is the simplest way to go about this. Better to write code that reflects what you're actually doing rather than trying to micro-optimize.

So you want something like this:

char bits[32];
// populate bits
uint32_t value = 0;
for (int i=0; i<32; i++) {
    value |= (uint32_t)(bits[i] & 1) << i;
}

来源：https://stackoverflow.com/questions/53586667/how-to-convert-a-sequence-of-32-char-0-1-to-32-bits-uint32-t

标签

c++

bit-manipulation

bit-shift

data-conversion