String vs char[]

后端 未结 4 559
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-30 03:11

I have some slides from IBM named : \"From Java Code to Java Heap: Understanding the Memory Usage of Your Application\", that says, when we use String instead of <

相关标签:
4条回答
  • 2021-01-30 03:40

    In the JVM, a character variable is stored in a single 16-bit memory allocation and changes to that Java variable overwrite that same memory location.This makes creating or updating character variables very fast and memory-cheap, but increases the JVM's overhead compared to the static allocation as used in Strings.

    The JVM stores Java Strings in a variable size memory space (essentially, an array), which is exactly the same size (plus 1, for the string termination character) of the string when the String object is created or first assigned a value. Thus, an object with initial value "HELP!" would be allocated 96 bits of storage ( 6 characters, each 16-bits in size). This value is considered immutable, allowing the JVM to inline references to that variable, making static string assignments very fast, and very compact, plus very efficient from the JVM point of view.

    Reference

    0 讨论(0)
  • 2021-01-30 03:41

    This figure relates to JDK 6- 32-bit.

    JDK 6

    In pre-Java-7 world strings which were implemented as a pointer to a region of a char[] array:

    // "8 (4)" reads "8 bytes for x64, 4 bytes for x32"
    
    class String{      //8 (4) house keeping + 8 (4) class pointer
        char[] buf;    //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
        int offset;    //4 bytes                     -> three int
        int length;    //4 bytes                     -> fields align to
        int hash;      //4 bytes                     -> 16 (12) bytes
    }
    

    So I counted:

    36 bytes per new String("a") for JDK 6 x32  <-- the overhead from the article
    56 bytes per new String("a") for JDK 6 x64.
    


    JDK 7

    Just to compare, in JDK 7+ String is a class which holds a char[] buffer and a hash field only.

    class String{      //8 (4) + 8 (4) bytes             -> 16 (8)  aligned
        char[] buf;    //12 (8) bytes + 2 bytes per char -> 24 (16) aligned
        int hash;      //4 bytes                         -> 8  (4)  aligned
    }
    

    So it's:

    28 bytes per String for JDK 7 x32 
    48 bytes per String for JDK 7 x64.
    

    UPDATE

    For 3.75:1 ratio see @Andrey's explanation below. This proportion falls down to 1 as the length of the string grows.

    Useful links:

    • Memory usage of Java Strings and string-related objects.
    • Calculate memory of a Map Entry - a simple technique to get a size of an object.
    0 讨论(0)
  • 2021-01-30 03:45

    I'll try explaining the numbers referenced in the source article.

    The article describes object metadata typically consisting of: class, flags and lock.

    The class and lock are stored in the object header and take 8 bytes on 32bit VM. I haven't found though any information about JVM implementations which has flags info in the object header. It might be so that this is stored somewhere externally (e.g. by garbage collector to count references to the object etc.).

    So let's assume that the article talks about some x32 AbstractJVM which uses 12 bytes of memory to store meta information about the object.

    Then for char[] we have:

    • 12 bytes of meta information (8 bytes on x32 JDK 6, 16 bytes on x64 JDK)
    • 4 bytes for array size
    • 2 bytes for each character stored
    • 2 bytes of alignment if characters number is odd (on x64 JDK: 2 * (4 - (length + 2) % 4))

    For java.lang.String we have:

    • 12 bytes of meta information (8 bytes on x32 JDK6, 16 bytes on x64 JDK6)
    • 16 bytes for String fields (it is so for JDK6, 8 bytes for JDK7)
    • memory needed to store char[] as described above

    So, let's count how much memory is needed to store "MyString" as String object:

    12 + 16 + (12 + 4 + 2 * "MyString".length + 2 * ("MyString".length % 2)) = 60 bytes.
    

    From other side we know that to store only the data (without information about the data type, length or anything else) we need:

    2 * "MyString".length = 16 bytes
    

    Overhead is 60 / 16 = 3.75

    Similarly for single character array we get the 'maximum overhead':

    12 + 16 + (12 + 4 + 2 * "a".length + 2 * ("a".length % 2)) = 48 bytes
    2 * "a".length = 2 bytes
    48 / 2 = 24
    

    Following the article authors' logic ultimately the maximum overhead of value infinity is achieved when we store an empty string :).

    0 讨论(0)
  • 2021-01-30 03:47

    I had read from old stackoverflow answer not able to get it. In Oracle's JDK a String has four instance-level fields:

    A character array
    An integral offset
    An integral character count
    An integral hash value
    

    That means that each String introduces an extra object reference (the String itself), and three integers in addition to the character array itself. (The offset and character count are there to allow sharing of the character array among String instances produced through the String#substring() methods, a design choice that some other Java library implementers have eschewed.) Beyond the extra storage cost, there's also one more level of access indirection, not to mention the bounds checking with which the String guards its character array.

    If you can get away with allocating and consuming just the basic character array, there's space to be saved there. It's certainly not idiomatic to do so in Java though; judicious comments would be warranted to justify the choice, preferably with mention of evidence from having profiled the difference.

    0 讨论(0)
提交回复
热议问题