I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?
Since arrays keep
If you know in advance how large the data is then an array will be faster.
A List is more flexible. You can use an ArrayList which is backed by an array.
List is the preferred way in java 1.5 and beyond as it can use generics. Arrays cannot have generics. Also Arrays have a pre defined length, which cannot grow dynamically. Initializing an array with a large size is not a good idea. ArrayList is the the way to declare an array with generics and it can dynamically grow. But if delete and insert is used more frequently, then linked list is the fastest data structure to be used.
UPDATE:
As Mark noted there is no significant difference after JVM warm up (several test passes). Checked with re-created array or even new pass starting with new row of matrix. With great probability this signs simple array with index access is not to be used in favor of collections.
Still first 1-2 passes simple array is 2-3 times faster.
ORIGINAL POST:
Too much words for the subject too simple to check. Without any question array is several times faster than any class container. I run on this question looking for alternatives for my performance critical section. Here is the prototype code I built to check real situation:
import java.util.List;
import java.util.Arrays;
public class IterationTest {
private static final long MAX_ITERATIONS = 1000000000;
public static void main(String [] args) {
Integer [] array = {1, 5, 3, 5};
List<Integer> list = Arrays.asList(array);
long start = System.currentTimeMillis();
int test_sum = 0;
for (int i = 0; i < MAX_ITERATIONS; ++i) {
// for (int e : array) {
for (int e : list) {
test_sum += e;
}
}
long stop = System.currentTimeMillis();
long ms = (stop - start);
System.out.println("Time: " + ms);
}
}
And here is the answer:
Based on array (line 16 is active):
Time: 7064
Based on list (line 17 is active):
Time: 20950
Any more comment on 'faster'? This is quite understood. The question is when about 3 time faster is better for you than flexibility of List. But this is another question.
By the way I checked this too based on manually constructed ArrayList
. Almost the same result.
Don't get into the trap of optimizing without proper benchmarking. As others have suggested use a profiler before making any assumption.
The different data structures that you have enumerated have different purposes. A list is very efficient at inserting elements in the beginning and at the end but suffers a lot when accessing random elements. An array has fixed storage but provides fast random access. Finally an ArrayList improves the interface to an array by allowing it to grow. Normally the data structure to be used should be dictated by how the data stored will be access or added.
About memory consumption. You seem to be mixing some things. An array will only give you a continuous chunk of memory for the type of data that you have. Don't forget that java has a fixed data types: boolean, char, int, long, float and Object (this include all objects, even an array is an Object). It means that if you declare an array of String strings [1000] or MyObject myObjects [1000] you only get a 1000 memory boxes big enough to store the location (references or pointers) of the objects. You don't get a 1000 memory boxes big enough to fit the size of the objects. Don't forget that your objects are first created with "new". This is when the memory allocation is done and later a reference (their memory address) is stored in the array. The object doesn't get copied into the array only it's reference.
Although the answers proposing to use ArrayList do make sense in most scenario, the actual question of relative performance has not really been answered.
There are a few things you can do with an array:
Although get and set operations are somewhat slower on an ArrayList (resp. 1 and 3 nanosecond per call on my machine), there is very little overhead of using an ArrayList vs. an array for any non-intensive use. There are however a few things to keep in mind:
list.add(...)
) are costly and one should try to set the initial capacity at an adequate level when possible (note that the same issue arises when using an array)Here are the results I measured for those three operations using the jmh benchmarking library (times in nanoseconds) with JDK 7 on a standard x86 desktop machine. Note that ArrayList are never resized in the tests to make sure results are comparable. Benchmark code available here.
I ran 4 tests, executing the following statements:
Integer[] array = new Integer[1];
List<Integer> list = new ArrayList<> (1);
Integer[] array = new Integer[10000];
List<Integer> list = new ArrayList<> (10000);
Results (in nanoseconds per call, 95% confidence):
a.p.g.a.ArrayVsList.CreateArray1 [10.933, 11.097]
a.p.g.a.ArrayVsList.CreateList1 [10.799, 11.046]
a.p.g.a.ArrayVsList.CreateArray10000 [394.899, 404.034]
a.p.g.a.ArrayVsList.CreateList10000 [396.706, 401.266]
Conclusion: no noticeable difference.
I ran 2 tests, executing the following statements:
return list.get(0);
return array[0];
Results (in nanoseconds per call, 95% confidence):
a.p.g.a.ArrayVsList.getArray [2.958, 2.984]
a.p.g.a.ArrayVsList.getList [3.841, 3.874]
Conclusion: getting from an array is about 25% faster than getting from an ArrayList, although the difference is only on the order of one nanosecond.
I ran 2 tests, executing the following statements:
list.set(0, value);
array[0] = value;
Results (in nanoseconds per call):
a.p.g.a.ArrayVsList.setArray [4.201, 4.236]
a.p.g.a.ArrayVsList.setList [6.783, 6.877]
Conclusion: set operations on arrays are about 40% faster than on lists, but, as for get, each set operation takes a few nanoseconds - so for the difference to reach 1 second, one would need to set items in the list/array hundreds of millions of times!
ArrayList's copy constructor delegates to Arrays.copyOf
so performance is identical to array copy (copying an array via clone
, Arrays.copyOf
or System.arrayCopy
makes no material difference performance-wise).
Array is faster - all memory is pre-allocated in advance.