Following the question titled Warning generated due wrong strcmp parameter handling, there seems to be some questions regarding what the Standard actually guarantees regardi
strcmp (buf1, reinterpret_cast<char const *> (buf2));
This looks fine,
It is. strcmp
takes const char *
parameters, but internally converts them to const unsigned char *
(if required), so that even if char
is signed and two distinct bytes can compare equal when viewing them as char
, they will still compare different when viewing them with strcmp
.
C99:
7.21 String handling
<string.h>
7.21.1 String function conventions
3 For all functions in this subclause, each character shall be interpreted as if it had the type
unsigned char
(and therefore every possible object representation is valid and has a different value).
That said,
but does the Standard guarantee that the (1) will always yield true?
char unsigned * p1 = ...; char * p2 = reinterpret_cast<char *> (p1); *p1 == *p2; // (1)
What you wrote is not guaranteed.
Take a common implementation, with signed char
, 8-bit bytes using two's complement representation. If *p1
is UCHAR_MAX
, then *p2 == -1
, and *p1 == *p2
will be false because the promotion to int
gives them different values.
If you meant either (char) *p1 == *p2
, or *p1 == (unsigned char) *p2
, then those are still not guaranteed, so you do need to make sure that if you copy from an array of char
to an array of unsigned char
, you don't include such a conversion.
but there's no such guarantee in the C++11 Standard (N3337), nor in the upcoming C++14 (N3797).
char unsigned * p1 = ...;
char * p2 = reinterpret_cast<char *> (p1);
*p1 == *p2; // (1), not guaranteed to be true
Note: it is implementation specific whether char
is signed or unsigned; [basic.fundamental]p1
.
The Standard guarantees that every character type shall;
Sharing the same amount of storage, alignment requirement, and the guarantee about bit participation, means that casting a lvalue referring to one type (unsigned char), to another (char), is safe.. as far as the actual cast is concerned.
3.9.1p1
Fundamental types[basic.fundamental]
It is implementation-defined whether a
char
can hold negative values. Characters can be explicitly declaredsigned
orunsigned
.A
char,
asigned char,
and anunsigned char
occupy the same amount of storage and have the same alignment requirements (3.11); that is, they have the same object representation. For character types, all bits of the object representation participate in the value representation.For unsigned character types, all possible bit patterns of the value representation represent numbers. These requirements do not hold for other types.
3.9p4
Types[basic.types]
The object representation of an object of type
T
is the sequence of Nunsigned char
objects taken up by the object of typeT,
whereN
equalssizeof(T)
. The value representation of an object is the set of bits that hold the value of typeT
.
If we assign the maximum value of an unsigned char (UCHAR_MAX) to *p1
and *p2
is signed, *p2
won't be able to represent this value. We will overflow *p2
and it will, most likely, end up having the value of -1
.
Note: signed integer overflow is actually undefined behavior.
*p1 = UCHAR_MAX;
*p1 == *p2; // (1)
Both sides of operator==
must have the same type before we can compare them, and currently one side is unsigned char
and the other char
.
The compiler will therefor resort to integral promotion to find a type that can represent all combined possible values of the two types; and in this case the resulting type will be int
.
After the integral promotion the statement is semantically equivalent to int (UCHAR_MAX) == int(-1)
, which of course is false.