Valid identifier characters in Scala

后端 未结 2 1732
太阳男子
太阳男子 2020-11-28 04:28

One thing I find quite confusing is knowing which characters and combinations I can use in method and variable names. For instance

val #^ = 1 // legal
val #          


        
相关标签:
2条回答
  • 2020-11-28 04:39

    Working from the EBNF syntax in the spec:

    upper ::= ‘A’ | ... | ‘Z’ | ‘$’ | ‘_’ and Unicode category Lu
    lower ::= ‘a’ | ... | ‘z’ and Unicode category Ll
    letter ::= upper | lower and Unicode categories Lo, Lt, Nl
    digit ::= ‘0’ | ... | ‘9’
    opchar ::= “all other characters in \u0020-007F and Unicode
                categories Sm, So except parentheses ([]) and periods”
    

    But also taking into account the very beginning on Lexical Syntax that defines:

    Parentheses ‘(’ | ‘)’ | ‘[’ | ‘]’ | ‘{’ | ‘}’.
    Delimiter characters ‘‘’ | ‘’’ | ‘"’ | ‘.’ | ‘;’ | ‘,’
    

    Here is what I come up with. Working by elimination in the range \u0020-007F, eliminating letters, digits, parentheses and delimiters, we have for opchar... (drumroll):

    ! # % & * + - / : < = > ? @ \ ^ | ~ and also Sm and So - except for parentheses and periods.

    (Edit: adding valid examples here:). In summary, here are some valid examples that highlights all cases - watch out for \ in the REPL, I had to escape as \\:

    val !#%&*+-/:<=>?@\^|~ = 1 // all simple opchars
    val simpleName = 1 
    val withDigitsAndUnderscores_ab_12_ab12 = 1 
    val wordEndingInOpChars_!#%&*+-/:<=>?@\^|~ = 1
    val !^©® = 1 // opchars ans symbols
    val abcαβγ_!^©® = 1 // mixing unicode letters and symbols
    

    Note 1:

    I found this Unicode category index to figure out Lu, Ll, Lo, Lt, Nl:

    • Lu (uppercase letters)
    • Ll (lowercase letters)
    • Lo (other letters)
    • Lt (titlecase)
    • Nl (letter numbers like roman numerals)
    • Sm (symbol math)
    • So (symbol other)

    Note 2:

    val #^ = 1 // legal   - two opchars
    val #  = 1 // illegal - reserved word like class or => or @
    val +  = 1 // legal   - opchar
    val &+ = 1 // legal   - two opchars
    val &2 = 1 // illegal - opchar and letter do not mix arbitrarily
    val £2 = 1 // working - £ is part of Sc (Symbol currency) - undefined by spec
    val ¬  = 1 // legal   - part of Sm
    

    Note 3:

    Other operator-looking things that are reserved words: _ : = => <- <: <% >: # @ and also \u21D2 ⇒ and \u2190

    0 讨论(0)
  • 2020-11-28 04:42

    The language specification. gives the rule in Chapter 1, lexical syntax (on page 3):

    1. Operator characters. These consist of all printable ASCII characters \u0020-\u007F. which are in none of the sets above, mathematical sym- bols(Sm) and other symbols(So).

    This is basically the same as your extract of Programming in Programming in Scala. + is not an Unicode mathematical symbol, but it is definitely an ASCII printable character not listed above (not a letter, including _ or $, a digit, a paranthesis, a delimiter).

    In your list:

    1. # is illegal not because the character is not an operator character (#^ is legal), but because it is a reserved word (on page 4), for type projection.
    2. &2 is illegal because you mix an operator character & and a non-operator character, digit 2
    3. £2 is legal because £ is not an operator character: it is not a seven bit ASCII, but 8 bit extended ASCII. It is not nice, as $ is not one either (it is considered a letter).
    0 讨论(0)
提交回复
热议问题