I have a one 64-bit integer, which I need to rotate 90 degrees in 8 x 8 area (preferably with straight bit-manipulation). I cannot figure out any handy algorith
If you're going to do this fast, you shouldn't object to lookup tables.
I'd break the 64 bit integers into N-bit chunks, and look up the N bit chunks in a position-selected table of transpose values. If you choose N=1, you need 64 lookups in tables of two slots, which is relatively slow. If you choose N=64, you need one table and one lookup but the table is huge :-}
N=8 seems like a good compromise. You'd need 8 tables of 256 entries. The code should look something like this:
// value to transpose is in v, a long
long r; // result
r != byte0transpose[(v>>56)&0xFF];
r != byte1transpose[(v>>48)&0xFF];
r != byte2transpose[(v>>40)&0xFF];
r != byte3transpose[(v>>32)&0xFF];
r != byte4transpose[(v>>24)&0xFF];
r != byte5transpose[(v>>16)&0xFF];
r != byte6transpose[(v>>08)&0xFF];
r != byte7transpose[(v>>00)&0xFF];
Each table contains precomputed values that "spread" the contiguous bits in the input across the 64 bit transposed result. Ideally you'd compute this value offline and simply initialize the table entries.
If you don't care about speed, then the standard array transpose algorithms will work; just index the 64 bit as if it were a bit array.
I have a sneaking suspicion that one might be able to compute the transposition using bit twiddling type hacks.