问题
As I described in Antlr greedy-option I have some problems with a language that could include string-literals inside a string-literal, such as:
START: "img src="test.jpg""
Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:
START: "img src='test.jpg'"
before starting the lexer (and parser).
File-input could be:
START: "aaa"aaa" "aaa"aaaaa" :END_START START: "aaa"aaa" "aaa"aa a aa" :END_START START: "aaab"bbaaaa" :END_START
So I have got a solution, but it is not correct. I have two questions regarding to my problem (below the code). My code would be:
public static void main(String[] args) {
try{
FileInputStream fis = new FileInputStream("src/file.txt");
String preparedCode = preparingCode(fis);
ANTLRStringStream in = new ANTLRStringStream(preparedCode);
TestLexer lex = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lex);
TestParser parser = new TestParser(tokens);
parser.rule();
}catch(IOException ex){
ex.printStackTrace();
} catch (RecognitionException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
static String preparingCode(FileInputStream input){
DataInputStream data = new DataInputStream(input);
StringBuilder oldCode = new StringBuilder();
StringBuffer newCode = new StringBuffer(oldCode.length());
Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
String strLine;
try{
while ((strLine = data.readLine()) != null)
oldCode.append(strLine + "\n");
}
catch(IOException ex){
ex.printStackTrace();
}
Matcher matcher = pattern.matcher(oldCode);
while (matcher.find()) {
//eliminate quotes inside a string literal
String stringLiteral = matcher.group(2).replaceAll("\"", "'");
String replace = matcher.group(1) + stringLiteral + matcher.group(3);
matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
}
matcher.appendTail(newCode);
System.out.println(newCode);
return newCode.toString();
}
My questions are:
Which pattern would be the correct one? It is important that the string literal could be defined over more than one line e.g. "aaaa"\n"bbb", but always closes with an "\n:END_START" line. My wish would be the following result:
START: "aaa'aaa' 'aaa'aaaaa" :END_START START: "aaa'aaa' 'aa'aa a aa" :END_START START: "aaab'bbaaaa" :END_START
I played around with the pattern flag Pattern.DOTALL
Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
But this is not the solution, because in this case it matches everything...
- If I would use the correct pattern, is there any other efficient way how to fix it?
Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
回答1:
Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
The code:
public static void main(String[] args) {
try{
FileInputStream fis = new FileInputStream("src/file.txt");
String preparedCode = preparingCode(fis);
ANTLRStringStream in = new ANTLRStringStream(preparedCode);
TestLexer lex = new TestLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lex);
TestParser parser = new TestParser(tokens);
parser.rule();
}catch(IOException ex){
ex.printStackTrace();
} catch (RecognitionException e) {
System.out.println(e.getMessage());
System.exit(0);
}
}
static String preparingCode(FileInputStream input){
DataInputStream data = new DataInputStream(input);
StringBuilder oldCode = new StringBuilder();
StringBuffer newCode = new StringBuffer(oldCode.length());
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
String strLine;
try{
while ((strLine = data.readLine()) != null)
oldCode.append(strLine + "\n");
}
catch(IOException ex){
ex.printStackTrace();
}
Matcher matcher = pattern.matcher(oldCode);
while (matcher.find()) {
System.out.println("++++"+matcher.group(2));
//eliminate quotes inside a string literal
String stringLiteral = matcher.group(2).replaceAll("\"", "'");
String replace = matcher.group(1) + stringLiteral + matcher.group(3);
matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
}
matcher.appendTail(newCode);
System.out.println(newCode);
return newCode.toString();
}
So is there any other way how to fix this problem?
来源:https://stackoverflow.com/questions/10013170/efficiently-replacing-a-string-or-character-from-file-input-for-the-antlrinputst