I have an array of char (usually thousands of bytes long) read from a file, all composed of 0 and 1 (not \'0\' and \'1\', in which case I could use strtoul
). I want
Bit shifting is the simplest way to go about this. Better to write code that reflects what you're actually doing rather than trying to micro-optimize.
So you want something like this:
char bits[32];
// populate bits
uint32_t value = 0;
for (int i=0; i<32; i++) {
value |= (uint32_t)(bits[i] & 1) << i;
}
If you don't need the output bits to appear in exactly the same order as the input bytes, but if they can instead be "interleaved" in a specific way, then a fast and portable way to accomplish this is to take 8 blocks of 8 bytes (64 bytes total) and to combine all the LSBs together into a single 8 byte value.
Something like:
uint32_t extract_lsbs2(uint8_t (&input)[32]) {
uint32_t t0, t1, t2, t3, t4, t5, t6, t7;
memcpy(&t0, input + 0 * 4, 4);
memcpy(&t1, input + 1 * 4, 4);
memcpy(&t2, input + 2 * 4, 4);
memcpy(&t3, input + 3 * 4, 4);
memcpy(&t4, input + 4 * 4, 4);
memcpy(&t5, input + 5 * 4, 4);
memcpy(&t6, input + 6 * 4, 4);
memcpy(&t7, input + 7 * 4, 4);
return
(t0 << 0) |
(t1 << 1) |
(t2 << 2) |
(t3 << 3) |
(t4 << 4) |
(t5 << 5) |
(t6 << 6) |
(t7 << 7);
}
This generates "not terrible, not great" code on most compilers.
If you use uint64_t
instead of uint32_t
it would generally be twice as fast (assuming you have more than 32 total bytes to transform) on a 64-bit platform.
With SIMD you could easy vectorize the entire operation in something like two instructions (for AVX2, but any x86 SIMD ISA will work): compare and pmovmskb
.
Restricted to the x86 platform, you can use the PEXT instruction. It is part of the BMI2 instruction set extension on newer processors.
Use 32-bit instructions in a row and then merge the results in one value with shifts.
This is probably the optimal approach on Intel processors, but the disadvantage is that this instruction is slow on AMD Ryzen.