问题
The following expression:
^(#ifdef FEATURE)+?\s*$((\r\n.*?)*^(#endif)+\s*[\/\/]*\s*(end of)*\s*FEATURE)+?$
Overrides the matching buffer when running my compiled .Jar file.
The matching string can be similar to:
this is a junk line
#ifdef FEATURE
#endif // end of FEATUREthis is a junk line
#ifdef FEATURE
this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h
#endif FEATURE
this is a junk line
So, the bold strings should match. The error is as follows:
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
at java.util.regex.Pattern$GroupHead.match(Unknown Source)
at java.util.regex.Pattern$Loop.match(Unknown Source)
at java.util.regex.Pattern$GroupTail.match(Unknown Source)
at java.util.regex.Pattern$Curly.match1(Unknown Source)
at java.util.regex.Pattern$Curly.match(Unknown Source)
at java.util.regex.Pattern$Slice.match(Unknown Source)
Any backtracking avoiding strategy/improvement of the expression is welcome. I have tried the atomic groups (?>)
but doesn't simplify, for some reason.
The code is the following:
public String strip(String text) {
ArrayList<String> patterns=new ArrayList<String>();
patterns=readFile("Disabled_Features.txt");
for(int i = 0; i < patterns.size(); ++i)
{
Pattern todoPattern = Pattern.compile("^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$",Pattern.MULTILINE);
Matcher m = todoPattern.matcher(text);
text = m.replaceAll("");
}
return text;
}
回答1:
I have tried the code written by @Wiktor and works pretty well
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static void main(String[] args) {
String text = "this is a junk line\n" +
"\n" +
"#ifdef FEATURE \n" +
"#endif // end of FEATURE\n" +
"\n" +
"this is a junk line\n" +
"\n" +
"#ifdef FEATURE\n" +
"\n" +
"this is a junk line that should be matched: HOLasduiqwhei & // FEATURE fjfefj #endif // h\n" +
"\n" +
"#endif FEATURE\n" +
"\n" +
"this is a junk line";
// this version does not use Pattern.MULTILINE, this should reduce the backtraking
Matcher matcher2 = Pattern.compile("\\n#ifdef FEATURE((?:\\r?\\n(?!#endif (?:// end of )?FEATURE).*)*)\\r?\\n#endif (?:// end of )?FEATURE").matcher(text);
while (matcher2.find()) {
System.out.println(matcher2.group());
}
}
}
This let me to think that your problem is due to the size of input file.
So, in case your file is too big, you could implement the input as a CharSequence
, so you could wrap your large text files. Why? Because building a Matcher
from a Pattern
takes a CharSequence
as an argument.
https://github.com/fge/largetext
回答2:
UPDATE:
I tried implementing the solution of Wiktor:
"^#ifdef "+patterns.get(i)+"((?:\\r?\\n(?!#endif (?:// end of )?"+patterns.get(i)+"$).*)*)\\r?\\n#endif (?:// end of )?"+patterns.get(i)+"$"
and it captures only the second block, but not the following one:
#ifdef FEATURE
Junk captured text
#endif // end of FEATURE
Anyway, when I run the jar still getting overflow.
来源:https://stackoverflow.com/questions/42063553/regular-expression-backtracks-until-overflow-in-java