What's a time efficient algorithm to copy unaligned bit arrays?

后端未结

关注

 4  1960

醉梦人生 2021-02-09 19:57

I\'ve had to do this many times in the past, and I\'ve never been satisfied with the results.

Can anyone suggest a fast way of copying a contiguous bit array fro

4条回答

星月不相逢 (楼主)

2021-02-09 20:30

What is optimal will depend upon the target platform. On some platforms without barrel shifters, shifting the whole vector right or left one bit, n times, for n<3, will be the fastest approach (on the PIC18 platform, an 8x-unrolled byte loop to shift left one bit will cost 11 instruction cycles per eight bytes). Otherwise, I like the pattern (note src2 will have to be initialized depending upon what you want done with the end of your buffer)

src1 = *src++; src2 = (src1 shl shiftamount1) | (src2 shr shiftamount2); *dest++ = src2; src2 = *src++; src1 = (src2 shl shiftamount1) | (src1 shr shiftamount2); *dest++ = src1;

That should lend itself to very efficient implementation on an ARM (eight instructions every two words, if registers are available for src, dest, src1, src2, shiftamount1, and shiftamount2. Using more registers would allow faster operation via multi-word load/store instructions. Handling four words would be something like (one machine instruction per line, except the first four lines would together be one instruction, as would the last four lines ):

src0 = *src++; src1 = *src++; src2 = *src++; src3 = *src++; tmp = src0; src0 = src0 shr shiftamount1 src0 = src0 | src1 shl shiftamount2 src1 = src1 shr shiftamount1 src1 = src1 | src2 shl shiftamount2 src2 = src2 shr shiftamount1 src2 = src2 | src3 shl shiftamount2 src3 = src3 shr shiftamount1 src3 = src3 | tmp shl shiftamount2 *dest++ = src0; *dest++ = src1; *dest++ = src2; *dest++ = src3;

Eleven instructions per 16 bytes rotated.

0 讨论(0)

查看其它4个回答

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复