问题
I need to store and apply permutations to 16-bit integers. The best solution I came up with is to store permutation as 64-bit integer where each 4 bits correspond to the new position of i-th bit, the application would look like:
int16 permute(int16 bits, int64 perm)
{
int16 result = 0;
for(int i = 0; i < 16; ++i)
result |= ((bits >> i) & 1) * (1 << int( (perm >> (i*4))&0xf ));
return result;
}
is there a faster way to do this? Thank you.
回答1:
There are alternatives.
Any permutation can be handled by a Beneš network, and encoded as the masks that are the inputs to the multiplexers to apply the shuffle. This can be done reasonably efficiently in software too (not great but OK), it's just a bunch of butterfly permutations. The masks are a bit tricky to compute, but probably faster to apply than moving every bit on its own, though that depends on how many bits you're dealing with and 16 is not a lot.
Some smaller categories of shuffles can be handled by simpler (faster) networks, which you can also find on that page.
Finally in practice, on modern x86 hardware, there is the highly versatile pshufb
function which can apply a permutation (but may include dupes and zeroes) to 16 bytes in (typically) a single cycle. It is slightly awkward to distribute the bits over the bytes, but once you're there it only takes a pshufb
to permute and a pmovmskb
to compress it back down to 16 bits.
来源:https://stackoverflow.com/questions/43575633/fast-bit-permutation