What is the difference between “text” and new String(“text”)?

后端 未结 12 1137
悲&欢浪女
悲&欢浪女 2020-11-21 04:42

What is the difference between these two following statements?

String s = \"text\";

String s = new String(\"text\");
12条回答
  •  情歌与酒
    2020-11-21 05:03

    JLS

    The concept is called "interning" by the JLS.

    Relevant passage from JLS 7 3.10.5:

    Moreover, a string literal always refers to the same instance of class String. This is because string literals - or, more generally, strings that are the values of constant expressions (§15.28) - are "interned" so as to share unique instances, using the method String.intern.

    Example 3.10.5-1. String Literals

    The program consisting of the compilation unit (§7.3):

    package testPackage;
    class Test {
        public static void main(String[] args) {
            String hello = "Hello", lo = "lo";
            System.out.print((hello == "Hello") + " ");
            System.out.print((Other.hello == hello) + " ");
            System.out.print((other.Other.hello == hello) + " ");
            System.out.print((hello == ("Hel"+"lo")) + " ");
            System.out.print((hello == ("Hel"+lo)) + " ");
            System.out.println(hello == ("Hel"+lo).intern());
        }
    }
    class Other { static String hello = "Hello"; }
    

    and the compilation unit:

    package other;
    public class Other { public static String hello = "Hello"; }
    

    produces the output:

    true true true true false true
    

    JVMS

    JVMS 7 5.1 says:

    A string literal is a reference to an instance of class String, and is derived from a CONSTANT_String_info structure (§4.4.3) in the binary representation of a class or interface. The CONSTANT_String_info structure gives the sequence of Unicode code points constituting the string literal.

    The Java programming language requires that identical string literals (that is, literals that contain the same sequence of code points) must refer to the same instance of class String (JLS §3.10.5). In addition, if the method String.intern is called on any string, the result is a reference to the same class instance that would be returned if that string appeared as a literal. Thus, the following expression must have the value true:

    ("a" + "b" + "c").intern() == "abc"
    

    To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure.

    • If the method String.intern has previously been called on an instance of class String containing a sequence of Unicode code points identical to that given by the CONSTANT_String_info structure, then the result of string literal derivation is a reference to that same instance of class String.

    • Otherwise, a new instance of class String is created containing the sequence of Unicode code points given by the CONSTANT_String_info structure; a reference to that class instance is the result of string literal derivation. Finally, the intern method of the new String instance is invoked.

    Bytecode

    It is also instructive to look at the bytecode implementation on OpenJDK 7.

    If we decompile:

    public class StringPool {
        public static void main(String[] args) {
            String a = "abc";
            String b = "abc";
            String c = new String("abc");
            System.out.println(a);
            System.out.println(b);
            System.out.println(a == c);
        }
    }
    

    we have on the constant pool:

    #2 = String             #32   // abc
    [...]
    #32 = Utf8               abc
    

    and main:

     0: ldc           #2          // String abc
     2: astore_1
     3: ldc           #2          // String abc
     5: astore_2
     6: new           #3          // class java/lang/String
     9: dup
    10: ldc           #2          // String abc
    12: invokespecial #4          // Method java/lang/String."":(Ljava/lang/String;)V
    15: astore_3
    16: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
    19: aload_1
    20: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    23: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
    26: aload_2
    27: invokevirtual #6          // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    30: getstatic     #5          // Field java/lang/System.out:Ljava/io/PrintStream;
    33: aload_1
    34: aload_3
    35: if_acmpne     42
    38: iconst_1
    39: goto          43
    42: iconst_0
    43: invokevirtual #7          // Method java/io/PrintStream.println:(Z)V
    

    Note how:

    • 0 and 3: the same ldc #2 constant is loaded (the literals)
    • 12: a new string instance is created (with #2 as argument)
    • 35: a and c are compared as regular objects with if_acmpne

    The representation of constant strings is quite magic on the bytecode:

    • it has a dedicated CONSTANT_String_info structure, unlike regular objects (e.g. new String)
    • the struct points to a CONSTANT_Utf8_info Structure that contains the data. That is the only necessary data to represent the string.

    and the JVMS quote above seems to say that whenever the Utf8 pointed to is the same, then identical instances are loaded by ldc.

    I have done similar tests for fields, and:

    • static final String s = "abc" points to the constant table through the ConstantValue Attribute
    • non-final fields don't have that attribute, but can still be initialized with ldc

    Conclusion: there is direct bytecode support for the string pool, and the memory representation is efficient.

    Bonus: compare that to the Integer pool, which does not have direct bytecode support (i.e. no CONSTANT_String_info analogue).

提交回复
热议问题