One thing I find quite confusing is knowing which characters and combinations I can use in method and variable names. For instance
val #^ = 1 // legal
val #
Working from the EBNF syntax in the spec:
upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu
lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll
letter ::= upper | lower and Unicode categories Lo, Lt, Nl
digit ::= ‘0’ | ... | ‘9’
opchar ::= “all other characters in \u0020-007F and Unicode
categories Sm, So except parentheses ([]) and periods”
But also taking into account the very beginning on Lexical Syntax that defines:
Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’
Here is what I come up with. Working by elimination in the range \u0020-007F
, eliminating letters, digits, parentheses and delimiters, we have for opchar
... (drumroll):
! # % & * + - / : < = > ? @ \ ^ | ~
and also Sm and So - except for parentheses and periods.
(Edit: adding valid examples here:). In summary, here are some valid examples that highlights all cases - watch out for \
in the REPL, I had to escape as \\
:
val !#%&*+-/:<=>?@\^|~ = 1 // all simple opchars
val simpleName = 1
val withDigitsAndUnderscores_ab_12_ab12 = 1
val wordEndingInOpChars_!#%&*+-/:<=>?@\^|~ = 1
val !^©® = 1 // opchars ans symbols
val abcαβγ_!^©® = 1 // mixing unicode letters and symbols
Note 1:
I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl
:
Note 2:
val #^ = 1 // legal - two opchars
val # = 1 // illegal - reserved word like class or => or @
val + = 1 // legal - opchar
val &+ = 1 // legal - two opchars
val &2 = 1 // illegal - opchar and letter do not mix arbitrarily
val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec
val ¬ = 1 // legal - part of Sm
Note 3:
Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @
and also \u21D2
⇒ and \u2190
←
The language specification. gives the rule in Chapter 1, lexical syntax (on page 3):
- Operator characters. These consist of all printable ASCII characters \u0020-\u007F. which are in none of the sets above, mathematical sym- bols(Sm) and other symbols(So).
This is basically the same as your extract of Programming in Programming in Scala. +
is not an Unicode mathematical symbol, but it is definitely an ASCII printable character not listed above (not a letter, including _ or $, a digit, a paranthesis, a delimiter).
In your list:
$
is not one either (it is considered a letter).