FParsec identifiers vs keywords

前端 未结 2 1099
孤城傲影
孤城傲影 2020-12-18 06:58

For languages with keywords, some special trickery needs to happen to prevent for example \"if\" from being interpreted as an identifier and \"ifSomeVariableName\" from beco

相关标签:
2条回答
  • 2020-12-18 07:41

    I think, this problem is very simple. The answer is that you have to:

    1. Parse out an entire word ([a-z]+), lower case only;
    2. Check if it belongs to a dictionary; if so, return a keyword; otherwise, the parser will fall back;
    3. Parse identifier separately;

    E.g. (just a hypothetical code, not tested):

    let keyWordSet =
        System.Collections.Generic.HashSet<_>(
            [|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|]
        )
    let pKeyword =
       (many1Satisfy isLower .>> nonAlphaNumeric) // [a-z]+
       >>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword")
    
    let pContent =
        pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
    

    The code above will parse a keyword or an identifier twice. To fix it, alternatively, you may:

    1. Parse out an entire word ([a-z][A-Z]+[a-z][A-Z][0-9]+), e.g. everything alphanumeric;
    2. Check if it's a keyword or an identifier (lower case and belonging to a dictionary) and either
      1. Return a keyword
      2. Return an identifier

    P.S. Don't forget to order "cheaper" parsers first, if it does not ruin the logic.

    0 讨论(0)
  • 2020-12-18 07:57

    You can define a parser for whitespace and check if keyword or identifier is followed by it. For example some generic whitespace parser will look like

    let pWhiteSpace = pLineComment <|> pMultilineComment <|> pSpaces
    

    this will require at least one whitespace

    let ws1 = skipMany1 pWhiteSpace
    

    then if will look like

    let pIf = pstring "if" .>> ws1
    
    0 讨论(0)
提交回复
热议问题