问题
What is the usual way of tokenizing quoted strings that can contain an escape character? Here are some examples:
1) "this is good"
2) "this is\"good\""
3) "this \is good"
4) "this is bad\"
5) "this is \\"bad"
6) "this is bad
7) this is bad"
8) this is bad
Below is a sample parser that doesn't work quite right; it has expected results for all except examples 4 and 5, which parse successfully.
options
{
LOOKAHEAD = 3;
CHOICE_AMBIGUITY_CHECK = 2;
OTHER_AMBIGUITY_CHECK = 1;
STATIC = false;
DEBUG_PARSER = false;
DEBUG_LOOKAHEAD = false;
DEBUG_TOKEN_MANAGER = true;
ERROR_REPORTING = true;
JAVA_UNICODE_ESCAPE = false;
UNICODE_INPUT = false;
IGNORE_CASE = false;
USER_TOKEN_MANAGER = false;
USER_CHAR_STREAM = false;
BUILD_PARSER = true;
BUILD_TOKEN_MANAGER = true;
SANITY_CHECK = true;
FORCE_LA_CHECK = true;
}
PARSER_BEGIN(MyParser)
import java.io.ByteArrayInputStream;
import java.io.UnsupportedEncodingException;
public class MyParser {
public static void main(String[] args) throws UnsupportedEncodingException, ParseException{
//note that this conversion to an input stream is only good for small strings
MyParser parser = new MyParser(new ByteArrayInputStream(args[0].getBytes("UTF-8")));
parser.enable_tracing();
parser.myProduction();
System.out.println("Must have worked!");
}
}
PARSER_END(MyParser)
TOKEN:
{
<QUOTED:
"\""
(
"\\" ~[] //any escaped character
| //or
~["\""] //any non-quote character
)*
"\""
>
}
void myProduction() :
{}
{
<QUOTED>
<EOF>
}
You can run MyParser from the command line with an input to parse. It will print "must have worked!" if it worked, or throw an error if it didn't.
How do I change this parser to correctly fail on examples 4 and 5?
回答1:
To fix your regular expression, make it
TOKEN: {
<QUOTED:
"\""
(
"\\" ~[] //any escaped character
| //or
~["\"","\\"] //any character except quote or backslash
)*
"\"" >
}
来源:https://stackoverflow.com/questions/24156948/javacc-quote-with-escape-character