问题
I am teaching myself to use JavaCC in a hobby project, and have a simple grammar to write a parser for. Part of the parser includes the following:
TOKEN : { < DIGIT : (["0"-"9"]) > }
TOKEN : { < INTEGER : (<DIGIT>)+ > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>){2} > }
TOKEN : { < FLOAT : (<NEGATE>)? <INTEGER> | (<NEGATE>)? <INTEGER> "." <INTEGER> | (<NEGATE>)? <INTEGER> "." | (<NEGATE>)? "." <INTEGER> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>){2} > }
TOKEN : { < NUMBER_PAIR : <FLOAT_PAIR> | <INTEGER_PAIR> > }
TOKEN : { < NEGATE : "-" > }
When compiling with JavaCC I get the output:
Warning: Regular Expression choice : FLOAT_PAIR can never be matched as : NUMBER_PAIR
Warning: Regular Expression choice : INTEGER_PAIR can never be matched as : NUMBER_PAIR
I'm sure this is a simple concept but I don't understand the warning, being a novice in both parser generation and regular expressions.
What does this warning mean (in as-novice-as-you-can-get terms)?
回答1:
I don't know JavaCC, but I am a compiler engineer.
The FLOAT_PAIR
rule is ambiguous. Consider the following text:
0.0
This could be FLOAT 0
followed by FLOAT .0
; or it could be FLOAT 0.
followed by FLOAT 0
; both resulting in FLOAT_PAIR. Or it could be a single FLOAT 0.0
.
More importantly, though, you are using lexical analysis with composition in a way that is never likely to work. Consider this number:
12345
This could be parsed as INTEGER 12, INTEGER 345
resulting in an INTEGER_PAIR
. Or it could be parsed as INTEGER 123, INTEGER 45
, another INTEGER_PAIR
. Or it could be INTEGER 12345
, another token. The problem exists because you are not requiring white space between the lexical elements of the INTEGER_PAIR
(or FLOAT_PAIR
).
You should almost never try to handle pairs like this in the lexer. Instead, you should handle plain numbers (INTEGER
and FLOAT
) as tokens, and handle things like negation and pairing in the parser, where whitespace has been dealt with and stripped.
(For example, how are you going to process "----42"
? This is a valid expression in most programming languages, which will correctly calculate multiple negations, but would not be handled by your lexer.)
Also, be aware that single-digit integers in your lexer will not be matched as INTEGER
, they will come out as DIGIT
. I don't know the correct syntax for JavaCC to fix that for you, though. What you want is to define DIGIT
not as a token, but simply something you can use in the definitions of other tokens; alternatively, embed the definition of DIGIT
([0-9]
) directly wherever you are using DIGIT
in your rules.
回答2:
I haven't used JavaCC, but it is possible that NUMBER_PAIR is ambiguous.
I think the problem comes down to the fact that the same exact thing can be matched as both FLOAT_PAIR and INTEGER_PAIR since FLOAT can match an INTEGER.
But this is just a guess having never seen the JavaCC syntax :)
回答3:
It probably means that for every FLOAT_PAIR
you'll just get a FLOAT_PAIR
token, never a NUMBER_PAIR
token. The FLOAT_PAIR
rule already matches all the input and JavaCC will not try to find further matching rules. That would be my interpretation, but I don't know JavaCC, so take it with a grain of salt.
Maybe you can specify somehow that NUMBER_PAIR
is the main production and that you don't want to get any other tokens as results.
回答4:
Thanks to Barry Kelly's answer, the solution I've come up with is:
SKIP : { < #TO_SKIP : " " | "\t" > }
TOKEN : { < #DIGIT : (["0"-"9"]) > }
TOKEN : { < #DIGITS : (<DIGIT>)+ > }
TOKEN : { < INTEGER : <DIGITS> > }
TOKEN : { < INTEGER_PAIR : (<INTEGER>) (<TO_SKIP>)+ (<INTEGER>) > }
TOKEN : { < FLOAT : (<NEGATE>)?<DIGITS>"."<DIGITS> | (<NEGATE>)?"."<DIGITS> > }
TOKEN : { < FLOAT_PAIR : (<FLOAT>) (<TO_SKIP>)+ (<FLOAT>) > }
TOKEN : { < #NUMBER : <FLOAT> | <INTEGER> > }
TOKEN : { < NUMBER_PAIR : (<NUMBER>) (<TO_SKIP>)+ (<NUMBER>) >}
TOKEN : { < NEGATE : "-" > }
I had completely forgot to include the space which is used to separate the two tokens, I've also used the '#' symbol which stops the tokens being matched, and is just used in the definition of other tokens. The above is compiled by JavaCC without warning or error.
However, as noted by Barry, there are reasons against doing this.
来源:https://stackoverflow.com/questions/791591/explanation-and-solution-for-javaccs-warning-regular-expression-choice-foo-c