I\'ve had to do this many times in the past, and I\'ve never been satisfied with the results.
Can anyone suggest a fast way of copying a contiguous bit array fro
Your solution looks similar to most I've seen: basically do some unaligned work at the start and end, with the main loop in the middle using aligned accesses. If you really need efficiency and do this on very long bitstreams, I would suggest using something architecture-specific like SSE2 in the main loop.