Regular Expression backtracks until overflow in Java

筅森魡賤 提交于 2019-12-31 05:37:07

问题


The following expression:

^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$

Overrides the matching buffer when running my compiled .Jar file.

The matching string can be similar to:

this is a junk line

#ifdef FEATURE
#endif // end of FEATURE

this is a junk line

#ifdef FEATURE

this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h

#endif FEATURE

this is a junk line

So, the bold strings should match. The error is as follows:

   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)
   at java.util.regex.Pattern$GroupHead.match(Unknown Source)
   at java.util.regex.Pattern$Loop.match(Unknown Source)
   at java.util.regex.Pattern$GroupTail.match(Unknown Source)
   at java.util.regex.Pattern$Curly.match1(Unknown Source)
   at java.util.regex.Pattern$Curly.match(Unknown Source)
   at java.util.regex.Pattern$Slice.match(Unknown Source)

Any backtracking avoiding strategy/improvement of the expression is welcome. I have tried the atomic groups (?>) but doesn't simplify, for some reason.

The code is the following:

public String strip(String text) {

    ArrayList<String> patterns=new ArrayList<String>();
    patterns=readFile("Disabled_Features.txt");
    for(int i = 0; i < patterns.size(); ++i)
    {

      Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE); 

      Matcher m = todoPattern.matcher(text);
      text = m.replaceAll("");
    }
    return text;        
}

回答1:


I have tried the code written by @Wiktor and works pretty well

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegex {
  public static void main(String[] args) {
    String text = "this is a junk line\n" + 
        "\n" + 
        "#ifdef FEATURE \n" + 
        "#endif // end of FEATURE\n" + 
        "\n" + 
        "this is a junk line\n" + 
        "\n" + 
        "#ifdef FEATURE\n" + 
        "\n" + 
        "this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" + 
        "\n" + 
        "#endif FEATURE\n" + 
        "\n" + 
        "this is a junk line";

    // this version does not use Pattern.MULTILINE, this should reduce the backtraking
    Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text);
    while (matcher2.find()) {
      System.out.println(matcher2.group());
    }

  }
}

This let me to think that your problem is due to the size of input file.

So, in case your file is too big, you could implement the input as a CharSequence, so you could wrap your large text files. Why? Because building a Matcher from a Pattern takes a CharSequence as an argument.

https://github.com/fge/largetext




回答2:


UPDATE:

I tried implementing the solution of Wiktor:

"^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$"

and it captures only the second block, but not the following one:

#ifdef FEATURE

Junk captured text

#endif // end of FEATURE

Anyway, when I run the jar still getting overflow.



来源:https://stackoverflow.com/questions/42063553/regular-expression-backtracks-until-overflow-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!