Search cost of string interning and declaration of literal strings

烈酒焚心 提交于 2019-12-05 06:19:40

You're confusing compile time complexity with runtime complexity.

When the class is loaded, yes it does a search to see if each literal already exists (although I imagine it would use a hashmap for O(1) lookup instead of your proposal).

When the code runs, it has the reference to the string in memory so there is no additional cost than a non-literal.

So yes, literals are interned. According to the Javadoc for String,

A pool of strings, initially empty, is maintained privately by the class String.

You can invoke intern() on a String to add it to this pool. It logically follows that if a.equals(b) then a.intern() == b.intern(), since .intern() guarantees to return from the a unique pool.

Example:

class InternTest {
    // assuming InternTest is the only class, internPool.size = 0
    String x = "ABC"; // interned at class load, internPool.size = 1
    String y = "DEF"; // interned at class load, internPool.size = 2
    String z = "ABC"; // interned at class load, but match found - size = 2 still

    void foo() {
        // random int is just a mechanism to get something that I know won't
        // be interned at loadtime - could have loaded from file or database too
        int i = (new java.util.Random()).nextInt(1000) + 100;
        int j = i;
        String s = String.valueOf(i); // not yet interned, size = 2 still
        String t = String.valueOf(j); // not yet interned, size = 2 still

        String sIntern = s.intern(); // manually interned, size = 3 now
        String tIntern = t.intern(); // manually interned, match found, size = 3 still

        System.out.println("equals: " + (s.equals(t))); // should be true
        System.out.println("== raw: " + (s == t)); // should be false, different variables
        System.out.println("== int: " + (sIntern == tIntern)); // should be true, from unique pool

       System.out.println("x and z: " + (x == z)); // should be true, interned at class load
    }

    public static void main(String[] args) {
        (new InternTest()).foo();
    }

}

Results when run:

C:\Documents and Settings\glowcoder\My Documents>java InternTest
equals: true
== raw: false
== int: true
x and z: true

I should point out that the assumption will never be true. The Java language itself has many Strings that would be interned before our Strings ever got the chance to see the light of day. However, assuming everything is loaded sequentially, if you consider only the delta of Strings being interned, and assume no collisions with existing interns (we all know interns can be fussy and full of drama, right? snicker) then the numbers do indeed indicate the delta of the size of the string pool.

1 - When we declare literal strings, we search whether there is the same string in string pool of heap. Is this also an interning(method intern of class String)?

Yes. This process is called interning. However, it happens just once ... when the class containing the literal is loaded.

2 - In my thought, each literal string declaration needs a binary search or something so it costs at least log(n) when n is number of existing strings in the pool.

No it doesn't. The pool is a hash table.

... And if there are many strings in the pool, it may be high cost.

No it won't. The cost of a lookup in the string pool hash table is O(1).

... On this point of view, it might be dangerous to declare many literal strings.

The cost is not significant compared with the other costs of loading and then JIT compiling a class file. There is no performance related "danger" in declaring lots of literal strings.

Obviously, the String objects corresponding to string literals occupy memory "permanently", and you generally don't want to waste memory unnecessarily. But if you need to use those constant strings, they have to be represented somehow. And other ways of representing them either use the memory in other ways, or involve other runtime costs; e.g. the costs of reading them from a file or retrieving them from database.

The benefit of interning string literals is that the heap does not get cluttered up with multiple copies of the same literal string. This is probably not significant for typical SE / EE applications, but for the ME platforms heap memory is at a premium, and it would be a bad thing to waste it.


@RENO asks about the number of times Strings get interned. There are two cases:

  • Explicit calls to String.intern() happen as many (or as few) times as the application chooses to make.

  • For String literals, the javac compiler will ensure that a given .class file does not contain multiple copies of any String literal in its constant pool. This means that a class that has a given literal in many places will only result in the literal being interned once when the class is loaded. However, if you have two classes with the same literal string in their respective source code, they will both have the string value in their respective constant pools, and both intern the string when the respective classes are loaded.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!