Why in java (I dont know any other programming languages) can an identifier not start with a number and why are the following declarations also not allowed?
For example, are there not numerous times we wish to have objects with these names?
2ndInning
3rdBase
4thDim
7thDay
But imagine when someone might try to have a variable with the name 666:
int 666 = 777;
float 666F = 777F;
char 0xFF = 0xFF;
int a = 666; // is it 666 the variable or the literal value?
float b = 666F // is it 666F the variable or the literal value?
Perhaps, one way we might think is that variables that begin with a numeral must end with an alphabet - so long as it does not start with 0x and end with a letter used as a hexadeciamal digit, or it does not end with characters such as L or F, etc, etc.
But such rules would make it really difficult for programmers as Yogi Berra had quipped about - how could you think and hit at the same time? You are trying to write a computer programme as quickly and error free as possible and then you would have to bother with all these little bits and pieces of rules. I would rather, as a programmer, have a simple rule on how variables should be named.
In my efforts using lexers and regexp to parse data logs and data streams for insertion into databases, I have not found having a keyword or variable beginning with a numeral would make it anymore difficult to parse - so long there are as short a path as possible to remove ambiguity.
Therefore, it is not so much as making it easier for the compiler but for the programmer.
Every language needs to define what is a valid character for an identifier and what is not. Part of the consideration is going to be ease of parsing, part is going to be to avoid ambiguity (in other words even a perfect parsing algorithm couldn't be sure all the time), part is going to be the preference of the language design (in Java's case similarity with C, C++) and some is just going to be arbitrary.
The point is it has to be something, so this is what it is.
Languages could allow some of these things, but this simplifying assumption makes it easier on the compiler writer, and on you, the programmer, to read the program.
Parsers are (usually) written to break the source text into "tokens" first. An identifier that starts with a number looks like a number. Besides 5e3, is a valid number (5000.0) in some languages.
Meanwhile : and . are tokenized as operators. In some contexts an identifier that starts with one of these would lead to ambiguous code. And so forth.
I don't know exactly but i think that's because numbers are used to represent literal values, so when the compiler find a token that starts with a number, it knows it is dealing with a literal. if an identifier could start with a number, the compiler would need to use a look ahead to find the next character in the token to find out if it is an identifier or a literal.
Such things aren't allowed in just about any language (I can't think of one right now), mostly to prevent confusion.
Your example -d is an excellent example. How does the compiler know if you meant "the variable named -d" or "the negative of the number in the variable d"? Since it can't tell (or worse yet, it could so you couldn't be sure what would happened when you typed that without reading the rest of the file), it's not allowed.
The example 7g is the same thing. You can specify numbers as certain bases or types by adding letters to the end. The number 8357 is an int in Java, where as 8357L is a long (since there is an 'L' on the end). If variables could start with numbers, there would be cases where you couldn't tell if it was supposed to be a variable name or just a literal.
I would assume the others you listed have similar reasons behind them, some of which may be historical (i.e. C couldn't do it for reason X, and Java is designed to look like C so they kept the rule).
In practice, they are almost never a problem. It's very rare you find a situation where such things are annoying. The one you'll run into the most is variables starting with numbers, but you can always just spell them out (i.e. oneThing, twoThing, threeThing, etc.).
Generally you put that kind of limitation in for two reasons:
Consider the following code snippet:
int d, -d;
d = 3;
-d = 2;
d = -d;
If -d
is a legal identifier, then which value does d
have at the end? -3 or 2? It's ambiguous.
Also consider:
int 2e10f, f;
2e10f = 20;
f = 2e10f;
What value does f
have at the end? This is also ambiguous.
Also, it's a pain to read either way. If someone declares 2ex10
, is that a typo for two million or a variable name?
Making sure that identifiers start with letters means that the only language items they can conflict with are reserved keywords.