Is there an efficient (fast) algorithm that will perform bit expansion/duplication?
For example, expand each bit in an 8bit value by 3 (creating a 24bit value):
1101 0101 => 11111100 01110001 11000111
The brute force method that has been proposed is to create a lookup table. In the future, the expansion value may need to be variable. That is, in the above example we are expanding by 3 but may need to expand by some other value(s). This would require multiple lookup tables that I'd like to avoid if possible.
There is a chance to make it quicker than lookup table if arithmetic calculations are for some reason faster than memory access. This may be possible if calculations are vectorized (PPC AltiVec or Intel SSE) and/or if other parts of the program need to use every bit of cache memory.
If expansion factor = 3, only 7 instructions are needed:
out = (((in * 0x101 & 0x0F00F) * 0x11 & 0x0C30C3) * 5 & 0x249249) * 7;
Or other alternative, with 10 instructions:
out = (in | in << 8) & 0x0F00F;
out = (out | out << 4) & 0x0C30C3;
out = (out | out << 2) & 0x249249;
out *= 7;
For other expansion factors >= 3:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = out * ((1 << shift) + 1) & mask;
}
out *= (1 << N) - 1;
Or other alternative, for expansion factors >= 2:
unsigned mask = 0x0FF;
unsigned out = in;
for (scale = 4; scale != 0; scale /= 2)
{
shift = scale * (N - 1);
mask &= ~(mask << scale);
mask |= mask << (scale * N);
out = (out | out << shift) & mask;
}
out *= (1 << N) - 1;
shift
and mask
values are better to be calculated prior to bit stream processing.
You can do it one input bit at at time. Of course, it will be slower than a lookup table, but if you're doing something like writing for a tiny, 8-bit microcontroller without enough room for a table, it should have the smallest possible ROM footprint.
来源:https://stackoverflow.com/questions/9023129/algorithm-for-bit-expansion-duplication