High performance simple Java regular expressions

前端未结

关注

 4  539

Part of the code I\'m working on uses a bunch of regular expressions to search for some simple string patterns (e.g., patterns like \"foo[0-9]{3,4} bar\"). Currently, we use sta

相关标签:

4条回答

北海茫月

2021-02-08 04:25
If you want to avoid creating a new Matcher for each Pattern, use the usePattern() method, like so:
```
Pattern[] pats = {
  Pattern.compile("123"),
  Pattern.compile("abc"),
  Pattern.compile("foo")
};
String s = "123 abc";
Matcher m = Pattern.compile("dummy").matcher(s);
for (Pattern p : pats)
{
  System.out.printf("%s : %b%n", p.pattern(), m.reset().usePattern(p).find());
}
```
see the demo on Ideone

You have to use matcher's reset() method too, or find() will only search from the point where the previous match ended (assuming the match was successful).
0 讨论(0)
发布评论:

提交评论
- 加载中...
刺人心

2021-02-08 04:32
If you expect less than 50% of lines matching your regex, you can first try to test for some subsequence via String.indexOf() which is about 3 to 20 times faster for simple sequence compared to regex matcher:
```
if (line.indexOf("foo")>-1) && pattern.matcher(line).matches()) {
    ...
```
If you add to your code such heuristics, remember to always well document them, and verify using profiler that code is indeed faster compared to simple code.
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2021-02-08 04:40

Try matcher.reset("newinputtext") method to avoid creating new matchers each time you are calling Pattern.matcher.

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-02-08 04:48

You could try using the Pattern.matches() static method which would just return the boolean. That wouldn't return a Matcher object so it could help with the memory allocation issues.

That being said the regex pattern would not be precompiled so it would be a performance vs resources thing at the point.

0 讨论(0)
发布评论:

提交评论
- 加载中...