Why is parameter to isdigit integer?

后端未结

关注

 2  1447

The function std::isdigit is:

  int isdigit(int ch);

The return (Non-zero value if the character is a numeric character, zero otherwise.) s

相关标签:

2条回答

栀梦

2021-01-15 04:16

The reaons is to allow EOF as input. And EOF is (from here):

EOF integer constant expression of type int and negative value

0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2021-01-15 04:22
The accepted answer is correct, but I believe the question deserves more detail.

A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).

Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.

But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.

But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.

When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].

But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.

K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.

The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.

The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).

If you're a Posix fan, this is basically all you need.

For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?

The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
```
    int x = getc(source_file);
    if (x != EOF) { /* reached end of file */ }
    else if (0 <= x && x < 128) { /* plain 7-bit character */ }
    else if (128 <= x && x < 256) {
      // Here it gets interesting.
      bool b1 = isdigit(x);  // OK
      bool b2 = isdigit(static_cast<char>(x));  // NOT PORTABLE
      bool b3 = isdigit(static_cast<unsigned char>(x));  // CORRECT!
    }
```
0 讨论(0)
发布评论:

提交评论
- 加载中...