Is '\u0B95' a multicharacter literal?

后端 未结 4 1394
悲&欢浪女
悲&欢浪女 2021-01-11 19:23

In a previous answer I gave, I responded to the following warning being caused by the fact that \'\\u0B95\' requires three bytes and so is a multicharacter

4条回答
  •  心在旅途
    2021-01-11 19:49

    Somebody posted an answer that correctly answered the second part of my question (what value will the char have?) but has since deleted their post. Since that part was correct, I'll reproduce it here together with my answer for the first part (is it a multicharacter literal?).


    '\u0B95' is not a multicharacter literal and gcc is mistaken here. As stated in the question, a multicharacter literal is defined by (§2.14.3/1):

    An ordinary character literal that contains more than one c-char is a multicharacter literal.

    Since a universal-character-name is one expansion of a c-char, the literal '\u0B95' contains only one c-char. It would make sense if ordinary literals could not contain a universal-character-name for \u0B95 to be considered as six seperate characters (\, u, 0, etc.) but I cannot find this restriction anywhere. Therefore, it is a single character and the literal is not a multicharacter literal.

    To further support this, why would it be considered to be multiple characters? At this point we haven't even given it an encoding so we don't know how many bytes it would take up. In UTF-16 it would take 2 bytes, in UTF-8 it would take 3 bytes and in some imagined encoding it could take just 1 byte.

    So what value will the character literal have? First the universal-character-name is mapped to the corresponding encoding in the execution character set, unless it has not mapping in which case it has implementation-defined encoding (§2.14.3/5):

    A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named. If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding.

    Either way, the char literal gets the value equal to the numerical value of the encoding (§2.14.3/1):

    An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set.

    Now the important part, inconveniently tucked away in a different paragraph further in the section. If the value can not be represented in the char, it gets an implementation-defined value (§2.14.3/4):

    The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for literals with no prefix) ...

提交回复
热议问题