问题
I continue working on my JavaCC grammar for ECMAScript 5.1. It actually goes quite well, I think I've covered most of the expressions now.
I have now two questions, both of them are related to the automatic semicolon insertion (§7.9.1). This is one of them.
The specification defines the following production:
PostfixExpression :
LeftHandSideExpression
LeftHandSideExpression [no LineTerminator here] ++
LeftHandSideExpression [no LineTerminator here] --
How can I implement a reliable "no LineTerminator here" check?
For the record my LINE_TERMINATOR
is at the moment something like:
SPECIAL_TOKEN :
{
<LINE_TERMINATOR: <LF> | <CR> | <LS> | <PS> >
| < #LF: "\n" > /* Line Feed */
| < #CR: "\r" > /* Carriage Return */
| < #LS: "\u2028" > /* Line separator */
| < #PS: "\u2029" > /* Paragraph separator */
}
I have read about lexical states, but I am not sure if this is a right direction. I've checked a few other JavaScript grammars I have found, but did not find any similar rules there. (I actually feel myself a total cargo culter when I try to overtake something from these grammars.)
I'd be grateful for a pointer, a hint or just a keyword for the right search direction.
回答1:
I think for the "restricted productions" you can do this
void PostfixExpression() :
{} {
LeftHandSideExpression()
(
LOOKAHEAD( "++", {getToken(0).beginLine == getToken(1).beginLine})
"++"
|
LOOKAHEAD( "--", {getToken(0).beginLine == getToken(1).beginLine})
"--"
|
{}
)
}
回答2:
Update As Gunther pointed out, my original solution was not correct due to this paragraph in 7.4 of the spec:
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar.
I'm posting a correction but leaving my original solution at the end of the question.
Corrected solution
The core idea, as proposed by Theodore Norvell is to use semantic lookahead. However I have decided to implement a more safe check:
public static boolean precededByLineTerminator(Token token) {
for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
if (specialToken.kind == EcmaScriptParserConstants.LINE_TERMINATOR) {
return true;
} else if (specialToken.kind == EcmaScriptParserConstants.MULTI_LINE_COMMENT) {
final String image = specialToken.image;
if (StringUtils.containsAny(image, (char)0x000A, (char)0x000D, (char)0x2028,
(char)0x2029)) {
return true;
}
}
}
return false;
}
And the grammar is:
expression = LeftHandSideExpression()
(
LOOKAHEAD ( <INCR>, { !TokenUtils.precededByLineTerminator(getToken(1))} )
<INCR>
{
return expression.postIncr();
}
| LOOKAHEAD ( <DECR>, { !TokenUtils.precededByLineTerminator(getToken(1))} )
<DECR>
{
return expression.postDecr();
}
) ?
{
return expression;
}
So the ++
or --
are considered here iff they are not preceded by a line terminator.
Original solution
This not is how I finally solved it.
The core idea, as proposed by Theodore Norvell is to use semantic lookahead. However I have decided to implement a more safe check:
public static boolean precededBySpecialTokenOfKind(Token token, int kind) {
for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
if (specialToken.kind == kind) {
return true;
}
}
return false;
}
And the grammar is:
expression = LeftHandSideExpression()
(
LOOKAHEAD ( <INCR>, { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
<INCR>
{
return expression.postIncr();
}
| LOOKAHEAD ( <DECR>, { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
<DECR>
{
return expression.postDecr();
}
) ?
{
return expression;
}
So the ++
or --
are considered here iff they are not preceded by a line terminator.
来源:https://stackoverflow.com/questions/26782747/how-to-implement-javascript-ecmascript-no-lineterminator-here-rule-in-javacc