I have a one 64-bit integer, which I need to rotate 90 degrees in 8 x 8 area (preferably with straight bit-manipulation). I cannot figure out any handy algorith
If you're going to do this fast, you shouldn't object to lookup tables.
I'd break the 64 bit integers into N-bit chunks, and look up the N bit chunks in a position-selected table of transpose values. If you choose N=1, you need 64 lookups in tables of two slots, which is relatively slow. If you choose N=64, you need one table and one lookup but the table is huge :-}
N=8 seems like a good compromise. You'd need 8 tables of 256 entries. The code should look something like this:
// value to transpose is in v, a long
long r; // result
r != byte0transpose[(v>>56)&0xFF];
r != byte1transpose[(v>>48)&0xFF];
r != byte2transpose[(v>>40)&0xFF];
r != byte3transpose[(v>>32)&0xFF];
r != byte4transpose[(v>>24)&0xFF];
r != byte5transpose[(v>>16)&0xFF];
r != byte6transpose[(v>>08)&0xFF];
r != byte7transpose[(v>>00)&0xFF];
Each table contains precomputed values that "spread" the contiguous bits in the input across the 64 bit transposed result. Ideally you'd compute this value offline and simply initialize the table entries.
If you don't care about speed, then the standard array transpose algorithms will work; just index the 64 bit as if it were a bit array.
I have a sneaking suspicion that one might be able to compute the transposition using bit twiddling type hacks.
There is an efficient way to perform bit reversal, using O(log n) shift operations. If you interpret a 64-bit UINT as an 8x8 array of bits, then bit reversal corresponds to a rotation by 180 degrees.
Half of these shifts effectively perform a horizontal reflection; the other half perform a vertical reflection. To obtain rotations by 90 and 270 degrees, an orthogonal (i.e. vertical or horizontal) reflection could be combined with a diagonal reflection, but the latter remains an awkward bit.
typedef unsigned long long uint64;
uint64 reflect_vert (uint64 value)
{
value = ((value & 0xFFFFFFFF00000000ull) >> 32) | ((value & 0x00000000FFFFFFFFull) << 32);
value = ((value & 0xFFFF0000FFFF0000ull) >> 16) | ((value & 0x0000FFFF0000FFFFull) << 16);
value = ((value & 0xFF00FF00FF00FF00ull) >> 8) | ((value & 0x00FF00FF00FF00FFull) << 8);
return value;
}
uint64 reflect_horiz (uint64 value)
{
value = ((value & 0xF0F0F0F0F0F0F0F0ull) >> 4) | ((value & 0x0F0F0F0F0F0F0F0Full) << 4);
value = ((value & 0xCCCCCCCCCCCCCCCCull) >> 2) | ((value & 0x3333333333333333ull) << 2);
value = ((value & 0xAAAAAAAAAAAAAAAAull) >> 1) | ((value & 0x5555555555555555ull) << 1);
return value;
}
uint64 reflect_diag (uint64 value)
{
uint64 new_value = value & 0x8040201008040201ull; // stationary bits
new_value |= (value & 0x0100000000000000ull) >> 49;
new_value |= (value & 0x0201000000000000ull) >> 42;
new_value |= (value & 0x0402010000000000ull) >> 35;
new_value |= (value & 0x0804020100000000ull) >> 28;
new_value |= (value & 0x1008040201000000ull) >> 21;
new_value |= (value & 0x2010080402010000ull) >> 14;
new_value |= (value & 0x4020100804020100ull) >> 7;
new_value |= (value & 0x0080402010080402ull) << 7;
new_value |= (value & 0x0000804020100804ull) << 14;
new_value |= (value & 0x0000008040201008ull) << 21;
new_value |= (value & 0x0000000080402010ull) << 28;
new_value |= (value & 0x0000000000804020ull) << 35;
new_value |= (value & 0x0000000000008040ull) << 42;
new_value |= (value & 0x0000000000000080ull) << 49;
return new_value;
}
uint64 rotate_90 (uint64 value)
{
return reflect_diag (reflect_vert (value));
}
uint64 rotate_180 (uint64 value)
{
return reflect_horiz (reflect_vert (value));
}
uint64 rotate_270 (uint64 value)
{
return reflect_diag (reflect_horiz (value));
}
In the above code, the reflect_diag() function still requires many shifts. I suspect that it is possible to implement this function with fewer shifts, but I have not yet found a way to do that.
Without using any look-up tables, I can't see much better than treating each bit individually:
unsigned long r = 0;
for (int i = 0; i < 64; ++i) {
r += ((x >> i) & 1) << (((i % 8) * 8) + (7 - i / 8));
}
This is quite easy using IA32 SIMD, there's a handy opcode to extract every eighth bit from a 64 bit value (this was written using DevStudio 2005):
char
source [8] = {0, 0, 0, 0, 0, 0, 0, 0xd0},
dest [8];
__asm
{
mov ch,3
movq xmm0,qword ptr [source]
Rotate2:
lea edi,dest
mov cl,8
Rotate1:
pmovmskb eax,xmm0
psllq xmm0,1
stosb
dec cl
jnz Rotate1
movq xmm0,qword ptr [dest]
dec ch
jnz Rotate2
}
It rotates the data three times (-270 degrees) since +90 is a bit trickier (needs a bit more thought)
To expand on my comment to Ira's answer, you can use:
#define ROT_BIT_0(X) X, (X)|0x1UL
#define ROT_BIT_1(X) ROT_BIT_0(X), ROT_BIT_0((X) | 0x100UL)
#define ROT_BIT_2(X) ROT_BIT_1(X), ROT_BIT_1((X) | 0x10000UL)
#define ROT_BIT_3(X) ROT_BIT_2(X), ROT_BIT_2((X) | 0x1000000UL)
#define ROT_BIT_4(X) ROT_BIT_3(X), ROT_BIT_3((X) | 0x100000000UL)
#define ROT_BIT_5(X) ROT_BIT_4(X), ROT_BIT_4((X) | 0x10000000000UL)
#define ROT_BIT_6(X) ROT_BIT_5(X), ROT_BIT_5((X) | 0x1000000000000UL)
#define ROT_BIT_7(X) ROT_BIT_6(X), ROT_BIT_6((X) | 0x100000000000000UL)
static unsigned long rot90[256] = { ROT_BIT_7(0) };
unsigned long rotate90(unsigned long v)
{
unsigned long r = 0;
r |= rot90[(v>>56) & 0xff];
r |= rot90[(v>>48) & 0xff] << 1;
r |= rot90[(v>>40) & 0xff] << 2;
r |= rot90[(v>>32) & 0xff] << 3;
r |= rot90[(v>>24) & 0xff] << 4;
r |= rot90[(v>>16) & 0xff] << 5;
r |= rot90[(v>>8) & 0xff] << 6;
r |= rot90[v & 0xff] << 7;
return r;
}
This depends on 'unsigned long' being 64 bits, of course, and does the rotate assuming the bits are in row-major order with the msb being the upper right, which seems to be the case in this question....
If you look at this as a 2 dimensional array then you have the solution no? Just make the rows the new columns. First row is the last column, 2nd is the one before last and so on.
Visually at least, it looks like your solution.