NSCharacter Set uses int's but i need unassigned short?

末鹿安然 提交于 2019-12-05 08:27:05

This is a total mess

The reason this is a total mess is because you are running into a compiler bug and an arbitrary limitation in the C spec.

Scroll to the bottom for the fix.

Compiler warning

Format specifies type 'unsigned short' but the argument has type 'int'

My conclusion is that this is a compiler bug in Clang. It is definitely safe to ignore this warning, because (unsigned short) arguments are always promoted to (int) before they are passed to vararg functions anyway. This is all stuff that is in the C standard (and it applies to Objective C, too).

printf("%hd", 1); // Clang generates warning. GCC does not.
                  // Clang is wrong, GCC is right.

printf("%hd", 1 << 16); // Clang generates warning.  GCC does not.
                        // Clang is right, GCC is wrong.

The problem here is that neither compiler looks deep enough.

Remember, it is actually impossible to pass a short to printf(), because it must get promoted to int. GCC never gives a warning for constants, Clang ignores the fact that you are passing a constant and always gives a warning because the type is wrong. Both options are wrong.

I suspect nobody has noticed because -- why would you be passing a constant expression to printf() anyway?

In the short term, you can use the following hack:

#pragma GCC diagnostic ignored "-Wformat"

Universal character names

You can use \uXXXX notation. Except you can't, because the compiler won't let you use U+0085 this way. Why? See § 6.4.3 of C99:

A universal character name shall not specify a character whose short identifier is less than 00A0 other than 0024 ($), 0040 (@), or 0060 (), nor one in the range D800 through DFFF inclusive.

This rules out \u0085.

There is a proposal to fix this part of the spec.

The fix

You really want a constant string, don't you? Use this:

[NSCharacterSet characterSetWithCharactersInString:
  @"\t\n\r\xc2\x85\x0c\u2028\u2029"]

This relies on the fact that the source encoding is UTF-8. Don't worry, that's not going to change any time soon.

The \xc2\x85 in the string is the UTF-8 encoding of U+0085. The appearance of 85 in both is a coincidence.

The problem is that 0x0085, etc are literal ints. So they don't match the %C format specifier, which expects a unichar, which is an unsigned short.

There's no direct way to specify a literal short in C and I'm not aware of any Objective-C extension. But you can use a brute-force approach:

NSCharacterSet *stopCharacters =
         [NSCharacterSet characterSetWithCharactersInString:
                  [NSString stringWithFormat:@"< \t\n\r%C%C%C%C", 
                               (unichar)0x0085, (unichar)0x000C,
                               (unichar)0x2028, (unichar)0x2029]];

You don't need stringWithFormat, you can embed unicode chars directly into a string using the \u escape. For example \u0085.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!