For languages with keywords, some special trickery needs to happen to prevent for example \"if\" from being interpreted as an identifier and \"ifSomeVariableName\" from beco
I think, this problem is very simple. The answer is that you have to:
[a-z]+
), lower case only;keyword
; otherwise, the parser will fall back;identifier
separately;E.g. (just a hypothetical code, not tested):
let keyWordSet =
System.Collections.Generic.HashSet<_>(
[|"while"; "begin"; "end"; "do"; "if"; "then"; "else"; "print"|]
)
let pKeyword =
(many1Satisfy isLower .>> nonAlphaNumeric) // [a-z]+
>>= (fun s -> if keyWordSet.Contains(s) then (preturn x) else fail "not a keyword")
let pContent =
pLineComment <|> pOperator <|> pNumeral <|> pKeyword <|> pIdentifier
The code above will parse a keyword or an identifier twice. To fix it, alternatively, you may:
[a-z][A-Z]+[a-z][A-Z][0-9]+
), e.g. everything alphanumeric;P.S. Don't forget to order "cheaper" parsers first, if it does not ruin the logic.
You can define a parser for whitespace and check if keyword or identifier is followed by it. For example some generic whitespace parser will look like
let pWhiteSpace = pLineComment <|> pMultilineComment <|> pSpaces
this will require at least one whitespace
let ws1 = skipMany1 pWhiteSpace
then if will look like
let pIf = pstring "if" .>> ws1