String concatenation with the + symbol

时光总嘲笑我的痴心妄想 提交于 2019-12-06 03:43:36

The rule

“do not concatenate Strings with + !!!“

is wrong, because it is incomplete and therefore misleading.

The rule is

do not concatenate Strings with + in a loop

and that rule still holds. The original rule was never meant to be applied outside of loops!

A simple loop

String s = "";
for (int i = 0; i < 10000; i++) { s += i; }
System.out.println(s);

is still much still much slower than

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++) { sb.append(i); }
System.out.println(sb.toString());

because the Java compiler has to translate the first loop into

String s = "";
for (int i = 0; i < 1000; i++) { s = new StringBuilder(s).append(i).toString(); }
System.out.println(s);

Also the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is misleading at least, because this translation was already done with Java 1.0 (ok, not with StringBuilder but with StringBuffer, because StringBuilder was only added with Java5).


One could also argue that the claim

Today the JVM compiles the + symbol into a string builder (in most cases).

is simply wrong, because the compilation is not done by the JVM. It is done by the Java Compiler.


For the question: when does the Java compiler use StringBuilder.append() and when does it use some other mechanism?

The source code of the Java compiler (version 1.8) contains two places where String concationation through the + operator is handled.

The conclusion is that for the Java compiler from the OpenJDK (which means the compiler distributed by Oracle) the phrase in most cases means always. (Though this could change with Java 9, or it could be that another Java compiler like the one that is included within Eclipse uses some other mechanism).

Holger is right in his comment that in java-9 + for String concatenation is going to change from a StringBuilder to a strategy chosen by the JRE via invokedynamic. There are 6 strategies that are possible for String concatenation in jdk-9:

  private enum Strategy {
    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder}.
     */
    BC_SB,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but trying to estimate the required storage.
     */
    BC_SB_SIZED,

    /**
     * Bytecode generator, calling into {@link java.lang.StringBuilder};
     * but computing the required storage exactly.
     */
    BC_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also tries to estimate the required storage.
     */
    MH_SB_SIZED,

    /**
     * MethodHandle-based generator, that in the end calls into {@link java.lang.StringBuilder}.
     * This strategy also estimate the required storage exactly.
     */
    MH_SB_SIZED_EXACT,

    /**
     * MethodHandle-based generator, that constructs its own byte[] array from
     * the arguments. It computes the required storage exactly.
     */
    MH_INLINE_SIZED_EXACT
}

And the default one is not using a StringBuilder, it is MH_INLINE_SIZED_EXACT. It is actually pretty crazy how the implementation works, and it is trying to be highly optimized.

So, no the advice there as far as I can tell is bad. That by the way is the main effort that was put into by jdk by Aleksey Shipilev. He also added a big change into String internals in jdk-9 as they are now backed by a byte[] instead of char[]. This needed because ISO_LATIN_1 Strings can be encoded in a single byte (one character - one byte) so a lot of less space.

The statement, in this exact form, is just wrong, and it fits into the picture that the linked blog continues to write nonsense, like that you had to wrap references with Objects.toString(…) to handle null, e.g. "att1='" + Objects.toString(att1) + '\'' instead of just "att1='" + att1 + '\''. There is no need to do that and apparently, the author did never re-check these claims.

The JVM is not responsible for compiling the + operator, as this operator is merely a source code artifact. It’s the compiler, e.g. javac which is responsible, and while there is no guaranty about the compiled form, compilers are encouraged to use a builder by the Java Language Specification:

An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression.

Note that even if a compiler does not perform this optimization, there still is no such thing as a + operator on the byte code level, so the compiler has to pick an operation, a JVM understands, e.g. using String.concat, which might be even faster than using a StringBuilder in the case you’re just concatenating exactly two strings.

Even assuming the worst compilation strategy for string concatenation (still being within the specification), it would be wrong to say to never concatenate strings with +, as when you are defining compile time constants, using + is the only choice, and, of course, a compile-time constant is usually more efficient than using a StringBuilder at runtime.

In practice, the + operator applied to non constant strings was compiled to a StringBuffer usage before Java 5 and to a StringBuilder usage in Java 5 to Java 8. When the compiled code is identical to the manual usage of StringBuffer resp. StringBuilder, there can’t be a performance difference.

The transition to Java 5, more than a decade ago, was the first time, where string concatenation via + had a clear win over manual StringBuffer use, as simply recompiling the concatenation code made it use the potentially faster StringBuilder internally, while the code manually dealing with StringBuffer needed to be rewritten to use StringBuilder, which had been introduced in that version.

Likewise, Java 9 is going to compile the string concatenation using an invokedynamic instruction allowing the JRE to bind it to actual code doing the operation, including optimizations not possible in ordinary Java code. So only recompiling the string concatenation code is needed to get this feature, while there is no equivalent manual usage for it.

That said, while the premise is wrong, i.e. string concatenation never was considered evil, the advice is correct, do not hesitate to use it.

There are only a few cases where you really might improve performance by dealing with a buffer manually, i.e. when you need a large initial capacity or concatenate a lot within loops and that code has been identified as an actual performance bottleneck by a profiling tool

Pehlaj

When you concatenate strings using + operator, compiler translates concatenation code to use StringBuffer for better performance. In order to improve performance StringBuffer is the better choice.

The quickest way of concatenate two string using + operator.

String str = "Java";
str = str + "Tutorial";

The compiler translates this code as:

String s1 = "Java";
StringBuffer sb = new StringBuffer(s1);
sb.append("Tutorial");
s1 = sb.toString();

So it is better to use StringBuffer OR String.format for concatenation

Using String.format

String s = String.format("%s %s", "Java", "Tutorial");
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!