问题
I am not 100% sure of the mechanism in action so I decided to post here for further clarifications.
I am doing a project that should handle large amounts of data in Java (it has to be Java). I would like it to be as efficient as possible. By efficient I mean that memory and speed calculations should come in first and readability should come in second.
Now I have two ways to store my data: create one array of MyObject
1) MyObject[][] V = new MyObject[m][n]
Or create two arrays of int:
2) int[][] V = new int[m][n]
3) int[][] P = new int[m][n]
Clearly MyObject
contains at least two fields and some methods. Now I notice that while looping over the MyObject
array to assign values I have to call new
or else I get a null pointer exception. This means that the new
in line 1 didn't suffice. Is this a more expensive operation than, for sake of argument, P[i][j]=n
, considering that arrays are also objects in Java?
回答1:
Is this a more expensive operation than, for sake of argument, P[i][j]=n, considering that arrays are also objects in Java?
In the first case you create an array object which is to store other objects of type array. Both the array object and the objects that are to be stored in the array need to be instantiated meaning that you will need m * n + 1
object instantiations and also (m * n + 1) * objectSize
memory consumption.
In the second case you only have to instantiate the array object; int primitives are not objects so this should me more faster and also more memory efficient since and Object memory size is several times larger than that of an int. Here you basically have 1 object instantiation and (m * n) * intSize + objectSize
memory consumption.
Another reason for using primitives is the fact that when used as local variables they are kept on the stack; you will probably use intermediate local variables inside a method before storing the computed value in the array and the allocation/deallocation time for the memory of these variables is several times higher than that of an object which lives on the heap.
回答2:
I've often found through profiling that replacing an array of objects with several arrays of scalars improves memory consumption and performance.
However, only profiling can tell whether or not it is a worthwhile optimization in your case.
A good profiler will let you measure both the performance and the memory footprint of your code.
回答3:
For fast processing of truly massive amounts of data it's better to lay the data in a single contiguous block of memory in a way that data you access together are close to each other. This should minimize the cache misses, which is one of today's worst performance killers.
In java you achieve this through the use only one single one-dimensional array of primitives. If you use two arrays or even a two dimensional array the data is no longer guaranteed to be in one contiguous block.
Another, slightly more involved solution is using an off-heap data structure, like here: http://mechanical-sympathy.blogspot.com/2012/10/compact-off-heap-structurestuples-in.html
回答4:
First of all, you must use List or Set i.e. Collections in java instead of array. Because you may not know the size of data you need to handle. Moreover, collections has API methods which allow you to perform operations at easy like inserting elements or deleting them. Working with array is quite complex and error prone because you may need to iterate over it again and again and also size has to be determined at compile time which is not possible if you have variable size data.
Also, allocating memory at runtime (i.e. using new keyword) is expensive then just assigning the value to already existing object i.e. p[i][j]=v;
来源:https://stackoverflow.com/questions/15585757/java-array-efficiency