How do I convert a char
to an int
in C and C++?
(This answer addresses the C++ side of things, but the sign extension problem exists in C too.)
Handling all three char
types (signed
, unsigned
, and char
) is more delicate than it first appears. Values in the range 0 to SCHAR_MAX
(which is 127 for an 8-bit char
) are easy:
char c = somevalue;
signed char sc = c;
unsigned char uc = c;
int n = c;
But, when somevalue
is outside of that range, only going through unsigned char
gives you consistent results for the "same" char
values in all three types:
char c = somevalue;
signed char sc = c;
unsigned char uc = c;
// Might not be true: int(c) == int(sc) and int(c) == int(uc).
int nc = (unsigned char)c;
int nsc = (unsigned char)sc;
int nuc = (unsigned char)uc;
// Always true: nc == nsc and nc == nuc.
This is important when using functions from ctype.h, such as isupper
or toupper
, because of sign extension:
char c = negative_char; // Assuming CHAR_MIN < 0.
int n = c;
bool b = isupper(n); // Undefined behavior.
Note the conversion through int is implicit; this has the same UB:
char c = negative_char;
bool b = isupper(c);
To fix this, go through unsigned char
, which is easily done by wrapping ctype.h functions through safe_ctype:
template
int safe_ctype(unsigned char c) { return F(c); }
//...
char c = CHAR_MIN;
bool b = safe_ctype(c); // No UB.
std::string s = "value that may contain negative chars; e.g. user input";
std::transform(s.begin(), s.end(), s.begin(), &safe_ctype);
// Must wrap toupper to eliminate UB in this case, you can't cast
// to unsigned char because the function is called inside transform.
This works because any function taking any of the three char types can also take the other two char types. It leads to two functions which can handle any of the types:
int ord(char c) { return (unsigned char)c; }
char chr(int n) {
assert(0 <= n); // Or other error-/sanity-checking.
assert(n <= UCHAR_MAX);
return (unsigned char)n;
}
// Ord and chr are named to match similar functions in other languages
// and libraries.
ord(c)
always gives you a non-negative value – even when passed a negative char
or negative signed char
– and chr
takes any value ord
produces and gives back the exact same char
.
In practice, I would probably just cast through unsigned char
instead of using these, but they do succinctly wrap the cast, provide a convenient place to add error checking for int
-to-char
, and would be shorter and more clear when you need to use them several times in close proximity.