Non-ASCII characters in C

↘锁芯ラ 提交于 2019-12-01 17:47:06

C90 doesn't allow additional character in identifier (over those in the basic characters set), C99 do (both with the universal character syntax -- \uXXXX and \UXXXXXXXX -- and an implementation defined set of other characters).

6.4.2.1/1 in C99:

identifier:
    identifier-nondigit
    identifier identifier-nondigit
    identifier digit
identifier-nondigit:
    nondigit
    universal-character-name
    other implementation-defined characters
nondigit: one of
    _ a b c d e f g h i j k l m
    n o p q r s t u v w x y z
    A B C D E F G H I J K L M
    N O P Q R S T U V W X Y Z
digit: one of
    0 1 2 3 4 5 6 7 8 9

I don't know how well it is supported by C implementations, I know that Plan9 C compiler could handle other characters before it was standardized.

Do you mean the dot? It's character code 183 from ISO 8859-1 (ISO Latin-1) - it's an extended ASCII code corresponding (apparently) to the Georgian comma, aka "middle dot". It is actually a legal character.

The C99 Standard "allows" (for sufficiently small values of "allow") 'strange characters'

5.1.1.2 Translation phases

1 The precedence among the syntax rules of translation is specified by the following phases.

  1. Physical source file multibyte characters are mapped, in an implementation defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations.

Using that middle dot is discussed here:

http://code.google.com/p/go/issues/detail?id=793

Basically, using that dot is not part of the spec, but there are some cases where it is necessary. Bootstrapping, runtime, or assembly.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!