问题
It is not clear how to write portable code in C, using wide-character API. Consider this example:
#include <locale.h>
#include <wchar.h>
#include <wctype.h>
int main(void)
{
setlocale(LC_CTYPE, "C.UTF-8");
wchar_t wc = L'ÿ';
if (iswlower(wc)) return 0;
return 1;
}
Compiling it with gcc-6.3.0 using -Wconversion option gives this warning:
test.c: In function 'main':
test.c:9:16: warning: conversion to 'wint_t {aka unsigned int}' from 'wchar_t {aka int}' may change the sign of the result [-Wsign-conversion]
if (iswlower(wc)) return 0;
^
To get rid of this warning, we cast to (wint_t)
, like iswlower((wint_t)wc)
, but this is unportable.
The following example demonstrates why it is unportable.
#include <stdio.h>
/* this is our hypothetical implementation */
typedef signed int wint_t;
typedef signed short wchar_t;
#define WEOF ((wint_t)0xffffffff)
void f(wint_t wc)
{
if (wc==WEOF)
printf("BUG. Valid character recognized as WEOF. This is due to integer promotion. How to avoid it?\n");
}
int main(void)
{
wchar_t wc = (wchar_t)0xffff;
f((wint_t)wc);
return 0;
}
My question is: how to make this example portable, and at the same time avoid the gcc warning.
回答1:
To keep things simple, I'm going to assume that the platform/implementation I'm discussing has the following characteristics:
- two's complement integer types
int
is 32 bitsshort
is 16 bits
I'm also going to use C99 as a reference just because it's what I have open.
The standard says the following must be true about these types/macros:
wint_t
must be able to have at least one value that does not correspond to any member of the extended character set (7.24.1/2)WEOF
has a value that does not correspond to any member of the extended character set (7.24.1/3)wchar_t
can represent all values of the largest extended character set (7.17/2)
Keep in mind that by the C standard's definition of "value", the value of (short int) 0xffff
is the same as the value of (int) 0xffffffff
- that is they both have the value -1
(given the assumptions stated at the beginning of this answer). This is made clear by the standard's description of the integer promotions (6.3.1.1):
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
The integer promotions preserve value including sign.
I believe that when you combine these elements it seems that if WEOF
has the value -1
, then no item in an extended character set can have the value -1
. I think that this means that in your implementation example, either wchar_t
would have to be unsigned (if it remained a 16-bit type) or (wchar_t) 0xffff
could not be a valid character.
But there's another alternative that I originally forgot (and is probably the best solution for your example implementation) is that the standard states in a footnote that the "value of the macro WEOF
may differ from that of EOF
and need not be negative". So your implementation's problem can be fixed by making WEOF == INT_MAX
for example. That way it cannot have the same value as any wchar_t
.
The WEOF
value possibly overlapping with a valid character value is one that I suppose might occur in real implementations (even if the standard seems to prohibit it), and it's similar to issues that have been brought up regarding EOF
possibly having the same value as some valid signed char value.
It might be of interest that for most (all?) functions that can return WEOF
to indicate some sort of problem, the standard requires that the function set some addition indication about the error or condition (for example, setting errno
to a particular value, or setting the end-of-file indicator on a stream).
Another thing to note is that it's my understanding that 0xffff is a non-character in UCS-2 or UTF-16 (no idea about any other 16-bit encodings that might exist).
来源:https://stackoverflow.com/questions/43061489/how-to-avoid-integer-promotion-in-c