I continue working on my JavaCC grammar for ECMAScript 5.1. It actually goes quite well, I think I\'ve covered most of the expressions now.
I have now two questions, bot
Update As Gunther pointed out, my original solution was not correct due to this paragraph in 7.4 of the spec:
Comments behave like white space and are discarded except that, if a MultiLineComment contains a line terminator character, then the entire comment is considered to be a LineTerminator for purposes of parsing by the syntactic grammar.
I'm posting a correction but leaving my original solution at the end of the question.
The core idea, as proposed by Theodore Norvell is to use semantic lookahead. However I have decided to implement a more safe check:
public static boolean precededByLineTerminator(Token token) {
for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
if (specialToken.kind == EcmaScriptParserConstants.LINE_TERMINATOR) {
return true;
} else if (specialToken.kind == EcmaScriptParserConstants.MULTI_LINE_COMMENT) {
final String image = specialToken.image;
if (StringUtils.containsAny(image, (char)0x000A, (char)0x000D, (char)0x2028,
(char)0x2029)) {
return true;
}
}
}
return false;
}
And the grammar is:
expression = LeftHandSideExpression()
(
LOOKAHEAD ( , { !TokenUtils.precededByLineTerminator(getToken(1))} )
{
return expression.postIncr();
}
| LOOKAHEAD ( , { !TokenUtils.precededByLineTerminator(getToken(1))} )
{
return expression.postDecr();
}
) ?
{
return expression;
}
So the ++
or --
are considered here iff they are not preceded by a line terminator.
This not is how I finally solved it.
The core idea, as proposed by Theodore Norvell is to use semantic lookahead. However I have decided to implement a more safe check:
public static boolean precededBySpecialTokenOfKind(Token token, int kind) {
for (Token specialToken = token.specialToken; specialToken != null; specialToken = specialToken.specialToken) {
if (specialToken.kind == kind) {
return true;
}
}
return false;
}
And the grammar is:
expression = LeftHandSideExpression()
(
LOOKAHEAD ( , { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
{
return expression.postIncr();
}
| LOOKAHEAD ( , { !TokenUtils.precededBySpecialTokenOfKind(getToken(1), LINE_TERMINATOR)} )
{
return expression.postDecr();
}
) ?
{
return expression;
}
So the ++
or --
are considered here iff they are not preceded by a line terminator.