High performance simple Java regular expressions

前端 未结 4 531
北海茫月
北海茫月 2021-02-08 03:44

Part of the code I\'m working on uses a bunch of regular expressions to search for some simple string patterns (e.g., patterns like \"foo[0-9]{3,4} bar\"). Currently, we use sta

相关标签:
4条回答
  • 2021-02-08 04:25

    If you want to avoid creating a new Matcher for each Pattern, use the usePattern() method, like so:

    Pattern[] pats = {
      Pattern.compile("123"),
      Pattern.compile("abc"),
      Pattern.compile("foo")
    };
    String s = "123 abc";
    Matcher m = Pattern.compile("dummy").matcher(s);
    for (Pattern p : pats)
    {
      System.out.printf("%s : %b%n", p.pattern(), m.reset().usePattern(p).find());
    }
    

    see the demo on Ideone

    You have to use matcher's reset() method too, or find() will only search from the point where the previous match ended (assuming the match was successful).

    0 讨论(0)
  • 2021-02-08 04:32

    If you expect less than 50% of lines matching your regex, you can first try to test for some subsequence via String.indexOf() which is about 3 to 20 times faster for simple sequence compared to regex matcher:

    if (line.indexOf("foo")>-1) && pattern.matcher(line).matches()) {
        ...
    

    If you add to your code such heuristics, remember to always well document them, and verify using profiler that code is indeed faster compared to simple code.

    0 讨论(0)
  • 2021-02-08 04:40

    Try matcher.reset("newinputtext") method to avoid creating new matchers each time you are calling Pattern.matcher.

    0 讨论(0)
  • 2021-02-08 04:48

    You could try using the Pattern.matches() static method which would just return the boolean. That wouldn't return a Matcher object so it could help with the memory allocation issues.

    That being said the regex pattern would not be precompiled so it would be a performance vs resources thing at the point.

    0 讨论(0)
提交回复
热议问题