问题
I am writing a lexer for Haskell using JavaScript and Parsing Expression Grammar, the implementation I use being PEG.js.
I have a problem with making it work for reserved words, as demonstrated in a simplified form here:
program = ( word / " " )+
word = ( reserved / id )
id = ( "a" / "b" )+
reserved = ( "aa" )
The point here is to get a series of tokens that are either arbitrary sequences of a:s and/or b:s or the sequence "aa", and they are separated by spaces.
What I really get is either that every token that is not a space is recognized as id
or that a token that should be recognised as id
has all initial pairs of a:s eaten up as reserved
, e.g.
"aab" gets recognized as reserved "aa"
followed by id "b"
.
The way the Haskell lexical specification solves this ambiguity is to specify id like this:
id = ( "a" / "b" )+[BUT NOT reserved]
I have tried replicating this using various combinations of the PEG ! and & -operators to acheive the same effect, but have not found a way to get this to work properly.
The solution:
id = !reserved ( "a" / "b" )+
that I've seen suggested in several places does not work.
Is this a limitation in the particular PEG-implementation, PEG in itself or (hopefully) my methods?
Thanks in advance!
回答1:
!reserved ident
is a perfectly acceptable technique in any PEG implementation, and PEG.js seems to support it as well. Btw, you should add !id
after the definition of reserved
.
回答2:
As far as I know, PEG rules are positional. That basically means that rules are tried deterministically from the first to the last one. That said, you just need to put the "reserved" rule before declaring the "identifier" one.
来源:https://stackoverflow.com/questions/4933788/excluding-certain-elements-from-a-specified-set-in-parsing-expressive-grammar-p