I\'m reading C++ Primer and I\'m slightly confused by a few comments which talk about how Bitwise operators deal with signed types. I\'ll quote:
Quote #1
You're quite correct -- the expression ~'q' << 6
is undefined behavior according to the standard. Its even worse than you state, as the ~
operator is defined as computing "The one's complement" of the value, which is meaningless for a signed (2s-complement) integer -- the term "one's complement" only really means anything for an unsigned integer.
When doing bitwise operations, if you want strictly well-defined (according to the standard) results, you generally have to ensure that the values being operated on are unsigned. You can do that either with explicit casts, or by using explicitly unsigned constants (U
-suffix) in binary operations. Doing a binary operation with a signed and unsigned int is done as unsigned (the signed value is converted to unsigned).
C and C++ are subtley different with the integer promotions, so you need to be careful here -- C++ will convert a smaller-than-int unsigned value to int (signed) before comparing with the other operand to see what should be done, while C will compare operands first.
Of course, by editing the question, my answer is now partly answering a different question than the one posed, so here goes an attempt to answer the "new" question:
The promotion rules (what gets converted to what) are well defined in the standard. The type char
may be either signed
or unsigned
- in some compilers you can even give a flag to the compiler to say "I want unsigned char type" or "I want signed char type" - but most compilers just define char
as either signed
or unsigned
.
A constant, such as 6
is signed by default. When an operation, such as 'q' << 6
is written in the code, the compiler will convert any smaller type to any larger type [or if you do any arithmetic in general, char
is converted to int
], so 'q'
becomes the integer value of 'q'
. If you want to avoid that, you should use 6u
, or an explicit cast, such as static_cast<unsigned>('q') << 6
- that way, you are ensured that the operand is converted to unsigned, rather than signed.
The operations are undefined because different hardware behaves differently, and there are architectures with "strange" numbering systems, which means that the standards committee has to choose between "ruling out/making operations extremely inefficient" or "defining the standard in a way that isn't very clear". In a few architectures, overflowing integers may also be a trap, and if you shift such that you change the sign on the number, that typically counts as an overflow - and since trapping typically means "your code no longer runs", that would not be what your average programmer expects -> falls under the umbrella of "undefined behaviour". Most processors don't, and nothing really bad will happen if you do that.
Old answer:
So the solution to avoid this is to always cast your signed values (including char
) to unsigned before shifting them (or accept that your code may not work on another compiler, the same compiler with different options, or the next release of the same compiler).
It is also worth noting that the resulting value is "nearly always what you expect" (in that the compiler/processor will just perform the left or right shift on the value, on right shifts using the sign bit to shift down), it's just undefined or implementation defined because SOME machine architectures may not have hardware to "do this right", and C compilers still need to work on those systems.
The sign bit is the highest bit in a twos-complement, and you are not changing that by shifting that number:
11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000
Or as a hex number: 0xffffe380.
It might be simplest to read the exact text of the Standard, instead of a summary like in Primer Plus. (The summary has to leave out detail by virtue of being a summary!)
The relevant portions are:
[expr.shift]
The shift operators
<<
and>>
group left-to-right. The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.The value of
E1 << E2
isE1
left-shiftedE2
bit positions; vacated bits are zero-filled. IfE1
has an unsigned type, the value of the result isE1
×2
E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, ifE1
has a signed type and non-negative value, andE1
×2
E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.[expr.unary.op]/10
The operand of
˜
shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.
Note that neither of these performs the usual arithmetic conversions (which is the conversion to a common type that is done by most of the binary operators).
The integral promotions:
[conv.prom]/1
A prvalue of an integer type other than
bool
,char16_t
,char32_t
, orwchar_t
whose integer conversion rank is less than the rank of int can be converted to a prvalue of typeint
ifint
can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of typeunsigned int
.
(There are other entries for the types in the "other than" list, I have omitted them here but you can look it up in a Standard draft).
The thing to remmeber about the integer promotions is that they are value-preserving , if you have a char
of value -30
, then after promotion it will be an int
of value -30
. You don't need to think about things like "sign extension".
Your initial analysis of ~'q'
is correct, and the result has type int
(because int
can represent all the values of char
on normal systems).
It turns out that any int
whose most significant bit is set represents a negative value (there are rules about this in another part of the standard that I haven't quoted here), so ~'q'
is a negative int
.
Looking at [expr.shift]/2 we see that this means left-shifting it causes undefined behaviour (it's not covered by any of the earlier cases in that paragraph).