The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) s
The reaons is to allow EOF
as input. And EOF
is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char
in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char
and unsigned char
).
Where C grew up, char
was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, char
s were considered virtually indistinguishable from bytes, which is why functions like memcpy
take char *
rather than something like uint8_t *
, why sizeof char
is always 1, and why CHAR_BITS
isn't named BYTE_BITS
.
But the C standard, which was the baseline for C++, only promised that char
could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char
be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char
even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char
use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc()
was also for reading bytes from a binary file. All 256 possible values of a char
, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int
from a function. If you did, it would just be converted to int
anyway.
The natural solution was therefore to have getc()
return an int
containing either the character value or a special end-of-file value, imaginatively dubbed EOF
, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int
, they also made it difficult to pass in a small type. So int
was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch)
.
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your char
s are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char
values at all. They're really using unsigned char
values dressed up as int
s.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}