I have to keep thousands of strings in memory to be accessed serially in Java. Should I store them in an array or should I use some kind of List ?
Since arrays keep
Arrays recommended everywhere you may use them instead of list, especially in case if you know items count and size would not be changing.
See Oracle Java best practice: http://docs.oracle.com/cd/A97688_16/generic.903/bp/java.htm#1007056
Of course, if you need add and remove objects from collection many times easy use lists.
"Thousands" is not a large number. A few thousand paragraph-length strings are on the order of a couple of megabytes in size. If all you want to do is access these serially, use an immutable singly-linked List.
list is slower than arrays.If you need efficiency use arrays.If you need flexibility use list.
It you can live with a fixed size, arrays will will be faster and need less memory.
If you need the flexibility of the List interface with adding and removing elements, the question remains which implementation you should choose. Often ArrayList is recommended and used for any case, but also ArrayList has its performance problems if elements at the beginning or in the middle of the list must be removed or inserted.
You therefore may want to have a look at http://java.dzone.com/articles/gaplist-%E2%80%93-lightning-fast-list which introduces GapList. This new list implementation combines the strengths of both ArrayList and LinkedList resulting in very good performance for nearly all operations.
Depending on the implementation. it's possible that an array of primitive types will be smaller and more efficient than ArrayList. This is because the array will store the values directly in a contiguous block of memory, while the simplest ArrayList implementation will store pointers to each value. On a 64-bit platform especially, this can make a huge difference.
Of course, it's possible for the jvm implementation to have a special case for this situation, in which case the performance will be the same.
None of the answers had information that I was interested in - repetitive scan of the same array many many times. Had to create a JMH test for this.
Results (Java 1.8.0_66 x32, iterating plain array is at least 5 times quicker than ArrayList):
Benchmark Mode Cnt Score Error Units
MyBenchmark.testArrayForGet avgt 10 8.121 ? 0.233 ms/op
MyBenchmark.testListForGet avgt 10 37.416 ? 0.094 ms/op
MyBenchmark.testListForEach avgt 10 75.674 ? 1.897 ms/op
Test
package my.jmh.test;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
@State(Scope.Benchmark)
@Fork(1)
@Warmup(iterations = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public class MyBenchmark {
public final static int ARR_SIZE = 100;
public final static int ITER_COUNT = 100000;
String arr[] = new String[ARR_SIZE];
List<String> list = new ArrayList<>(ARR_SIZE);
public MyBenchmark() {
for( int i = 0; i < ARR_SIZE; i++ ) {
list.add(null);
}
}
@Benchmark
public void testListForEach() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( String str : list ) {
if( str != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
@Benchmark
public void testListForGet() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( int j = 0; j < ARR_SIZE; j++ ) {
if( list.get(j) != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
@Benchmark
public void testArrayForGet() {
int count = 0;
for( int i = 0; i < ITER_COUNT; i++ ) {
for( int j = 0; j < ARR_SIZE; j++ ) {
if( arr[j] != null )
count++;
}
}
if( count > 0 )
System.out.print(count);
}
}