Why was the space character not chosen for C++14 digit separators?

前端 未结 7 1660
心在旅途
心在旅途 2020-12-02 19:40

As of C++14, thanks to n3781 (which in itself does not answer this question) we may write code like the following:

const int x = 1\'234; // one thousand two          


        
相关标签:
7条回答
  • 2020-12-02 20:29

    The obvious reason for not using white space is that a new line is also white space, and that C++ treats all white space identically. And off hand, I don't know of any language which accepts arbitrary white space as a separator.

    Presumably, Unicode 0xA0 (non-breaking space) could be used—it is the most widely used solution when typesetting. I see two problems with that, however: first, it's not in the basic character set, and second, it's not visually distinctive; you can't see that it isn't a space by just looking at the text in a normal editor.

    Beyond that, there aren't many choices. You can't use the comma, since that is already a legal token (and something like 1,234 is currently legal C++, with the meaning 234). And in a context where it could occur in legal code, e.g. a[1,234]. While I can't quite imagine any real code actually using this, there is a basic rule that no legal program, regardless how absurd, should silently change semantics.

    Similar considerations mean that _ can't be used either; if there is a #define _234 * 2, then a[1_234] would silently change the meaning of the code.

    I can't say that I'm particularly pleased with the choice of ', but it does have the advantage of being used in continental Europe, at least in some types of texts. (I seem to remember having seen it in German, for example, although in typical running text, German, like most other languages, will use a point or a non breaking space. But maybe it was Swiss German.) The problem with ' is parsing; the sequence '1' is already legal, as is '123'. So something like 1'234 could be a 1, followed by the start of a character constant; I'm not sure how far you have to look-ahead to make the decision. There is no sequence of legal C++ in which an integral constant can be followed by a character constant, so there's no problem with breaking legal code, but it means that lexical scanning suddenly becomes very context dependent.

    (With regards to your comment: there is no logic in the choice of a decimal or a thousands separator. A decimal separator, for example, is certainly not a full stop. They are just arbitrary conventions.)

    0 讨论(0)
提交回复
热议问题