I don\'t want to optimize anything, I swear, I just want to ask this question out of curiosity. I know that on most hardware there\'s an assembly command of bit-shift (e.g.
Here's my favorite CPU, in which x<<2
takes twice as long as x<<1
:)
On ARM, this can be done as a side effect of another instruction. So potentially, there's no latency at all for either of them.
Some embedded processors only have a "shift-by-one" instruction. On such processors, the compiler would change x << 3
into ((x << 1) << 1) << 1
.
I think the Motorola MC68HCxx was one of the more popular families with this limitation. Fortunately, such architectures are now quite rare, most now include a barrel shifter with a variable shift size.
The Intel 8051, which has many modern derivatives, also cannot shift an arbitrary number of bits.
That depends both on the CPU and compiler. Even if the underlying CPU has arbitrary bit shift with a barrel shifter, this will only happen if the compiler takes advantage of that resource.
Keep in mind that shifting anything outside the width in bits of the data is "undefined behavior" in C and C++. Right shift of signed data is also "implementation defined". Rather than too much concern about speed, be concerned that you are getting the same answer on different implementations.
Quoting from ANSI C section 3.3.7:
3.3.7 Bitwise shift operators
Syntax
shift-expression: additive-expression shift-expression << additive-expression shift-expression >> additive-expression
Constraints
Each of the operands shall have integral type.
Semantics
The integral promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width in bits of the promoted left operand, the behavior is undefined.
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity, 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise. (The constants ULONG_MAX and UINT_MAX are defined in the header .)
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 divided by the quantity, 2 raised to the power E2 . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
So:
x = y << z;
"<<": y × 2z (undefined if an overflow occurs);
x = y >> z;
">>": implementation-defined for signed (most often the result of the arithmetic shift: y / 2z).
On some generations of Intel CPUs (P2 or P3? Not AMD though, if I remember right), the bitshift operations are ridiculously slow. Bitshift by 1 bit should always be fast though since it can just use addition. Another question to consider is whether bitshifts by a constant number of bits are faster than variable-length shifts. Even if the opcodes are the same speed, on x86 the nonconstant righthand operand of a bitshift must occupy the CL register, which imposes additional constrains on register allocation and may slow the program down that way too.
Potentially depends on the CPU.
However, all modern CPUs (x86, ARM) use a "barrel shifter" -- a hardware module specifically designed to perform arbitrary shifts in constant time.
So the bottom line is... no. No difference.