问题
After asking this SO question, I received a very interesting comment from @AndonM.Coleman that I had to verify.
Since your disassembled code is written for x86, it is worth pointing out that XOR will set/clear the Zero Flag whereas NOT will not (sometimes useful if you want to perform a bitwise operation without affecting jump conditions that rely on flags from previous operations). Now, considering you're not writing assembly directly, you really have no access to this flag in a meaningful way so I doubt this is the reason for favoring one over the other.
His comment got me curious if the following code would produce the same assembly instructions
#include <iostream>
int main()
{
unsigned int val = 0;
std::cout << "Enter a numeric value: ";
std::cin >> val;
if ( (val ^ ~0U) == 0)
{
std::cout << "Value inverted is zero" << std::endl;
} else
{
std::cout << "Value inverted is not zero" << std::endl;
}
if ( (~val) == 0)
{
std::cout << "Value inverted is zero" << std::endl;
} else
{
std::cout << "Value inverted is not zero" << std::endl;
}
return 0;
}
For the following two operations
if ( (val ^ ~0U) == 0 )
and
if ( (~val) == 0 )
The not optimized build in Visual Studio 2010 gives the following disassembly:
if ( (val ^ ~0U) == 0)
00AD1501 mov eax,dword ptr [val]
00AD1504 xor eax,0FFFFFFFFh
00AD1507 jne main+86h (0AD1536h)
if ( (~val) == 0)
00AD1561 mov eax,dword ptr [val]
00AD1564 not eax
00AD1566 test eax,eax
00AD1568 jne main+0E7h (0AD1597h)
My question regards optimisation. Is it better to write
if ( (val ^ ~0U) == 0)
or
if ( (~val) == 0)
回答1:
This depends on a lot of things, but mostly what (if anything) you tell the compiler to optimize for.
If the compiler is set to optimize for size (smallest bytecode), then sometimes it will use XOR
in seemingly strange places. For instance, the variable length encoding scheme X86 uses can set a register to 0 by XOR
'ing itself in fewer bytes of code than would be required using the MOV
instruction.
Consider the code that uses XOR
:
if ( (val ^ ~0U) == 0 ) /* 3-bytes to negate and test (x86) */
XOR eax,0FFFFFFFFh
requires 3-bytes AND sets/clears the Zero Flag (ZF)
Now, consider the code that uses NOT
:
if ( (~val) == 0) /* 4-bytes to negate and test (x86) */
NOT eax
is encoded into a 2-byte instruction, but does not affect CPU flags.
TEST eax,eax
adds an additional 2-bytes, and is necessary to set/clear the Zero Flag (ZF)
NOT
is also a simple instruction, but since it does not affect any CPU flags, you must issue a TEST
instruction afterwards to use it for branching as seen in your code. This actually produces larger bytecode, so a smart compiler set to optimize for size would probably try to avoid using NOT
. How many cycles both of these instructions together take to complete varies between CPU generation, and a smart compiler would also factor this into its decision making when told to optimize for speed.
If you are not writing hand-tuned assembly, it is best to use whatever is clearest to a human and hope that the compiler is smart enough to choose different instructions/scheduling/etc. to optimize for size/speed as requested at compile-time. Compilers have a smart set of heuristics they use to choose and schedule instructions, they know more about the target CPU architecture than the average coder.
If you find out later that this branch really is a bottleneck and there is no higher-level way around the problem, then you could do some low-level tuning. However, this is such a trivial thing to focus on these days unless you are targeting something like a low-power embedded CPU or memory limited device. The only places I have ever squeezed out enough performance by hand-tuning to make it worthwhile were in algorithms that benefited from data parallelism and where the compiler was not smart enough to effectively utilize specialized instruction sets like MMX/SSE.
来源:https://stackoverflow.com/questions/22367511/bitwise-operators-not-vs-xor-use-in-branching