My task is to check (>trillions checks), does two int contain any of predefined pairs of nibbles (first pair 0x2 0x7; second 0xd 0x8). For example:
bit offset:
Have you tried unrolling the loop?
if( ( ((A & 0x0000000F) == 0x0000000D) && ((B & 0x0000000F) == 0x00000008) )
|| ( ((A & 0x000000F0) == 0x000000D0) && ((B & 0x000000F0) == 0x00000080) )
|| ( ((A & 0x00000F00) == 0x00000D00) && ((B & 0x00000F00) == 0x00000800) )
|| ( ((A & 0x0000F000) == 0x0000D000) && ((B & 0x0000F000) == 0x00008000) )
// etc
// Then repeat with 2 & 7
I believe unrolling the loop will result in the same number of bitwise and operations, and the same number of comparisons, but you'll save the effort of performing all the right shifts and storing the results.
Edit: (in response to unrolling results in conditional and nonconditional jumps)
This would eliminate any jumps, at the expense of doing additional work. It's been a while since I worked on something that needed this type of optimization, but this should result in no jumps whatsoever. (If it doesn't, try replacing the && with &. The && may be triggering the compiler to produce short-circuiting logic, but & may make it evaluate the second half always, with no jumps.)
bool result = false;
result |= ( ((A & 0x0000000F) == 0x0000000D) && ((B & 0x0000000F) == 0x00000008) )
result |= ( ((A & 0x000000F0) == 0x000000D0) && ((B & 0x000000F0) == 0x00000080) )
result |= ( ((A & 0x00000F00) == 0x00000D00) && ((B & 0x00000F00) == 0x00000800) )
result |= ( ((A & 0x0000F000) == 0x0000D000) && ((B & 0x0000F000) == 0x00008000) )
// etc
return result;