How are Integer arrays stored internally, in the JVM?

后端 未结 5 794
星月不相逢
星月不相逢 2020-12-20 23:22

An array of ints in java is stored as a block of 32-bit values in memory. How is an array of Integer objects stored? i.e.

int[] vs. Integer[]
相关标签:
5条回答
  • 2020-12-20 23:50

    John Rose working on fixnums in the JVM to fix this problem.

    0 讨论(0)
  • 2020-12-20 23:56

    It won't be much slower, but because an Integer[] must accept "null" as an entry and int[] doesn't have to, there will be some amount of bookkeeping involved, even if Integer[] is backed by an int[].

    So if every last ounce of performance matters, user int[]

    0 讨论(0)
  • 2020-12-20 23:58

    I think your hope is woefully naive. Specifically, it needs to deal with the issue that Integer can potentially be null, whereas int can not be. That alone is reason enough to store the object pointer.

    That said, the actual object pointer will be to a immutable int instance, notably for a select subset of integers.

    0 讨论(0)
  • 2020-12-20 23:59

    The reason that Integer can be null, whereas int cannot, is because Integer is a full-fledged Java object, with all of the overhead that includes. There's value in this since you can write

    Integer foo = new Integer();
    foo = null; 
    

    which is good for saying that foo will have a value, but it doesn't yet.

    Another difference is that int performs no overflow calculation. For instance,

    int bar = Integer.MAX_VALUE;
    bar++;
    

    will merrily increment bar and you end up with a very negative number, which is probably not what you intended in the first place.

    foo = Integer.MAX_VALUE;
    foo++;
    

    will complain, which I think would be better behavior.

    One last point is that Integer, being a Java object, carries with it the space overhead of an object. I think that someone else may need to chime in here, but I believe that every object consumes 12 bytes for overhead, and then the space for the data storage itself. If you're after performance and space, I wonder whether Integer is the right solution.

    0 讨论(0)
  • 2020-12-21 00:02

    No VM I know of will store an Integer[] array like an int[] array for the following reasons:

    1. There can be null Integer objects in the array and you have no bits left for indicating this in an int array. The VM could store this 1-bit information per array slot in a hiden bit-array though.
    2. You can synchronize in the elements of an Integer array. This is much harder to overcome as the first point, since you would have to store a monitor object for each array slot.
    3. The elements of Integer[] can be compared for identity. You could for example create two Integer objects with the value 1 via new and store them in different array slots and later you retrieve them and compare them via ==. This must lead to false, so you would have to store this information somewhere. Or you keep a reference to one of the Integer objects somewhere and use this for comparison and you have to make sure one of the == comparisons is false and one true. This means the whole concept of object identity is quiet hard to handle for the optimized Integer array.
    4. You can cast an Integer[] to e.g. Object[] and pass it to methods expecting just an Object[]. This means all the code which handles Object[] must now be able to handle the special Integer[] object too, making it slower and larger.

    Taking all this into account, it would probably be possible to make a special Integer[] which saves some space in comparison to a naive implementation, but the additional complexity will likely affect a lot of other code, making it slower in the end.

    The overhead of using Integer[] instead of int[] can be quiet large in space and time. On a typical 32 bit VM an Integer object will consume 16 byte (8 byte for the object header, 4 for the payload and 4 additional bytes for alignment) while the Integer[] uses as much space as int[]. In 64 bit VMs (using 64bit pointers, which is not always the case) an Integer object will consume 24 byte (16 for the header, 4 for the payload and 4 for alignment). In addition a slot in the Integer[] will use 8 byte instead of 4 as in the int[]. This means you can expect an overhead of 16 to 28 byte per slot, which is a factor of 4 to 7 compared to plain int arrays.

    The performance overhead can be significant too for mainly two reasons:

    1. Since you use more memory, you put on much more pressure on the memory subsystem, making it more likely to have cache misses in the case of Integer[]. For example if you traverse the contents of the int[] in a linear manner, the cache will have most of the entries already fetched when you need them (since the layout is linear too). But in case of the Integer array, the Integer objects itself might be scattered randomly in the heap, making it hard for the cache to guess where the next memory reference will point to.
    2. The garbage collection has to do much more work because of the additional memory used and because it has to scan and move each Integer object separately, while in the case of int[] it is just one object and the contents of the object doesn't have to be scanned (they contain no reference to other objects).

    To sum it up, using an int[] in performance critical work will be both much faster and memory efficient than using an Integer array in current VMs and it is unlikely this will change much in the near future.

    0 讨论(0)
提交回复
热议问题