As of C++14, thanks to n3781 (which in itself does not answer this question) we may write code like the following:
const int x = 1\'234; // one thousand two
The obvious reason for not using white space is that a new line is also white space, and that C++ treats all white space identically. And off hand, I don't know of any language which accepts arbitrary white space as a separator.
Presumably, Unicode 0xA0 (non-breaking space) could be used—it is the most widely used solution when typesetting. I see two problems with that, however: first, it's not in the basic character set, and second, it's not visually distinctive; you can't see that it isn't a space by just looking at the text in a normal editor.
Beyond that, there aren't many choices. You can't use the comma, since
that is already a legal token (and something like 1,234
is currently
legal C++, with the meaning 234). And in a context where it could occur
in legal code, e.g. a[1,234]
. While I can't quite imagine any real
code actually using this, there is a basic rule that no legal program,
regardless how absurd, should silently change semantics.
Similar considerations mean that _
can't be used either; if there is a
#define _234 * 2
, then a[1_234]
would silently change the meaning of
the code.
I can't say that I'm particularly pleased with the choice of '
, but it
does have the advantage of being used in continental Europe, at least in
some types of texts. (I seem to remember having seen it in German, for
example, although in typical running text, German, like most other
languages, will use a point or a non breaking space. But maybe it was
Swiss German.) The problem with '
is parsing; the sequence '1'
is
already legal, as is '123'
. So something like 1'234
could be a 1
,
followed by the start of a character constant; I'm not sure how far you
have to look-ahead to make the decision. There is no sequence of legal
C++ in which an integral constant can be followed by a character
constant, so there's no problem with breaking legal code, but it means
that lexical scanning suddenly becomes very context dependent.
(With regards to your comment: there is no logic in the choice of a decimal or a thousands separator. A decimal separator, for example, is certainly not a full stop. They are just arbitrary conventions.)