Efficiently replacing a string or character from file-input for the ANTLRInputStream (ANTLRStringStream)

喜你入骨 提交于 2019-12-24 10:47:54

问题


As I described in Antlr greedy-option I have some problems with a language that could include string-literals inside a string-literal, such as:

START: "img src="test.jpg""

Mr. Bart Kiers mentioned in my thread that it is not possible to create a grammar which could solve my problem. Therefore I decided to change the language to:

START: "img src='test.jpg'"

before starting the lexer (and parser).

File-input could be:

START: "aaa"aaa"
 "aaa"aaaaa"
:END_START

START: "aaa"aaa"
 "aaa"aa
 a
 aa"
:END_START

START: "aaab"bbaaaa"
:END_START

So I have got a solution, but it is not correct. I have two questions regarding to my problem (below the code). My code would be:

public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+)(\"\\n:END_START)");
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}


My questions are:

  • Which pattern would be the correct one? It is important that the string literal could be defined over more than one line e.g. "aaaa"\n"bbb", but always closes with an "\n:END_START" line. My wish would be the following result:
START: "aaa'aaa'
 'aaa'aaaaa"
:END_START

START: "aaa'aaa'
 'aa'aa
 a
 aa"
:END_START

START: "aaab'bbaaaa"
:END_START

I played around with the pattern flag Pattern.DOTALL

Pattern pattern = Pattern.compile("(START:\s\")(.+)(\"\n:END_START)", Pattern.DOTALL);
But this is not the solution, because in this case it matches everything...




- If I would use the correct pattern, is there any other efficient way how to fix it?



Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:
Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

回答1:


Fix for the first question
I have to use a non-greedy approach with the pattern flag Pattern.DOTALL:

Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);

The code:

 public static void main(String[] args) {

    try{
        FileInputStream fis = new FileInputStream("src/file.txt");
        String preparedCode = preparingCode(fis);

        ANTLRStringStream in = new ANTLRStringStream(preparedCode);

        TestLexer lex = new TestLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lex);
        TestParser parser = new TestParser(tokens);

        parser.rule();
    }catch(IOException ex){
        ex.printStackTrace();
    } catch (RecognitionException e) {
        System.out.println(e.getMessage());
        System.exit(0);
    }
}

static String preparingCode(FileInputStream input){
    DataInputStream data = new DataInputStream(input);
    StringBuilder oldCode = new StringBuilder();
    StringBuffer newCode = new StringBuffer(oldCode.length());

    Pattern pattern = Pattern.compile("(START:\\s\")(.+?)(\"\\n:END_START)", Pattern.DOTALL);
    String strLine;
    try{
      while ((strLine = data.readLine()) != null)   
          oldCode.append(strLine + "\n");
    }
    catch(IOException ex){
      ex.printStackTrace();
    }

    Matcher matcher = pattern.matcher(oldCode);

    while (matcher.find()) {
        System.out.println("++++"+matcher.group(2));
      //eliminate quotes inside a string literal
      String stringLiteral = matcher.group(2).replaceAll("\"", "'");

      String replace = matcher.group(1) + stringLiteral + matcher.group(3);
      matcher.appendReplacement(newCode, Matcher.quoteReplacement(replace));
    }
    matcher.appendTail(newCode);

    System.out.println(newCode);

    return newCode.toString();
}

So is there any other way how to fix this problem?



来源:https://stackoverflow.com/questions/10013170/efficiently-replacing-a-string-or-character-from-file-input-for-the-antlrinputst

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!