String.replaceAll is considerably slower than doing the job yourself

后端 未结 4 1298
余生分开走
余生分开走 2020-12-03 04:55

I have an old piece of code that performs find and replace of tokens within a string.

It receives a map of from and to pairs, iterates over

相关标签:
4条回答
  • 2020-12-03 05:02

    When it comes to replaceAll("[,. ]*", "") it's not that big of a surprise since it relies on regular expressions. The regex engine creates an automaton which it runs over the input. Some overhead is expected.

    The second approach (replace(",", "")...) also uses regular expressions internally. Here the given pattern is however compiled using Pattern.LITERAL so the regular expression overhead should be negligable.) In this case it is probably due to the fact that Strings are immutable (however small change you do, you will create a new string) and thus not as efficient as StringBuffers which manipulate the string in-place.

    0 讨论(0)
  • 2020-12-03 05:03

    As I have put in a comment [,. ]* matches the empty String "". So, every "space" between characters matches the pattern. It is only noted in performance because you are replacing a lot of "" by "".

    Try doing this:

    Pattern p = Pattern.compile("[,. ]*");
    System.out.println(p.matcher("Hello World").replaceAll("$$$");
    

    It returns:

    H$$$e$$$l$$$o$$$$$$W$$$o$$$r$$$l$$$d$$$!$$$

    No wonder it is slower that doing it "by hand"! You should try with [,. ]+

    0 讨论(0)
  • 2020-12-03 05:13

    While using regular expressions imparts some performance impact, it should not be as terrible.

    Note that using String.replaceAll() will compile the regular expression each time you call it.

    You can avoid that by explicitly using a Pattern object:

    Pattern p = Pattern.compile("[,. ]+");
    
    // repeat only the following part:
    String output = p.matcher(input).replaceAll("");
    

    Note also that using + instead of * avoids replacing empty strings and therefore might also speed up the process.

    0 讨论(0)
  • 2020-12-03 05:13

    replace and replaceAll uses regex internally which in most cases gives a serious performance impact compared to e.g., StringUtils.replace(..).

    String.replaceAll():

    public String replaceAll(String regex, String replacement) {
            return Pattern.compile(regex).matcher(this ).replaceAll(
                 replacement);
    }
    

    String.replace() uses Pattern.compile underneath.

    public String replace(CharSequence target, CharSequence replacement) {
      return Pattern.compile(target.toString(), Pattern.LITERAL)
             .matcher(this ).replaceAll(
               Matcher.quoteReplacement(replacement.toString()));
    }
    

    Also see Replace all occurrences of substring in a string - which is more efficient in Java?

    0 讨论(0)
提交回复
热议问题