I am attempting to use a regular expression with Scanner to match a string from a file. The regex works with all of the contents of the file except for this line:
As the others have said, your regex is much less efficient than it should be. I'd take it a step further and use possessive quantifiers:
"^([a-zA-Z]++) *+= *+\"([^\"]++)\"$"
But the way you're using the Scanner doesn't make much sense, either. There's no need to use findInLine(".*")
to read the line; that's what nextLine()
does. And you don't need to create another Scanner to apply your regex; just use a Matcher.
static final Pattern ANIMAL_INFO_PATTERN =
Pattern.compile("^([a-zA-Z]++) *+= *+\"([^\"]++)\"$");
...
Matcher lineMatcher = ANIMAL_INFO_PATTERN.matcher("");
while (scanFile.hasNextLine()) {
String currentLine = scanFile.nextLine();
if (lineMatcher.reset(currentLine).matches()) {
matches.put(lineMatcher.group(1), lineMatcher.group(2));
}
}
This looks like bug 5050507 . I agree with Asaph that removing the alternation should help; the bug specifically says "Avoid alternation whenever possible". I think you can go probably even simpler:
"^([a-zA-Z]+) *= *\"([^\"]+)"
Try this simplified version of your regex that removes some unnecessary |
operators (which might have been causing the regex engine to do a lot of branching) and includes beginning and end of line anchors.
static final String ANIMAL_INFO_REGEX = "^([a-zA-Z]+) *= *\"([a-zA-Z_. ]+)\"$";
read this to understand the problem: http://www.regular-expressions.info/catastrophic.html ... and then use one of the other suggestions