I just came across this seemingly innocuous comment, benchmarking ArrayList vs a raw String array. It\'s from a couple years ago, but the OP writes
I
Every claim that X is slower than Y on a JVM which does not address all the issues presented in this article ant it's second part spreads fears and lies about the performance of a typical JVM. This applies to the comment referred to by the original question as well as to GravityBringer's answer. I am sorry to be so rude, but unless you use appropriate micro benchmarking technology your benchmarks produce really badly skewed random numbers.
Tell me if you're interested in more explanations. Although it is all in the articles I referred to.
The situation has gotten worse for ArrayLists. On my computer running Java 6.26, there is a fourfold difference. Interestingly (and perhaps quite logically), there is no difference for raw arrays. I ran the following test:
int testSize = 5000000;
ArrayList<Double> list = new ArrayList<Double>();
Double[] arr = new Double[testSize];
//set up the data - make sure data doesn't have patterns
//or anything compiler could somehow optimize
for (int i=0;i<testSize; i++)
{
double someNumber = Math.random();
list.add(someNumber);
arr[i] = someNumber;
}
//ArrayList foreach
long time = System.nanoTime();
double total1 = 0;
for (Double k: list)
{
total1 += k;
}
System.out.println (System.nanoTime()-time);
//ArrayList get() method
time = System.nanoTime();
double total2 = 0;
for (int i=0;i<testSize;i++)
{
total2 += list.get(i);
}
System.out.println (System.nanoTime()-time);
//array foreach
time = System.nanoTime();
double total3 = 0;
for (Double k: arr)
{
total3 += k;
}
System.out.println (System.nanoTime()-time);
//array indexing
time = System.nanoTime();
double total4 = 0;
for (int i=0;i<testSize;i++)
{
total4 += arr[i];
}
System.out.println (System.nanoTime()-time);
//would be strange if different values were produced,
//but no, all these are the same, of course
System.out.println (total1);
System.out.println (total2);
System.out.println (total3);
System.out.println (total4);
The arithmetic in the loops is to prevent the JIT compiler from possibly optimizing away some of the code. The effect of the arithmetic on performance is small, as the runtime is dominated by the ArrayList accesses.
The runtimes are (in nanoseconds):
ArrayList foreach: 248,351,782
ArrayList get(): 60,657,907
array foreach: 27,381,576
array direct indexing: 27,468,091
The problem you have is that using an Iterator will be slower than using a direct lookup. On my machine the difference is about 0.13 ns per iteration. Using an array instead saves about 0.15 ns per iteration. This should be trivial in 99% of situations.
public static void main(String... args) {
int testLength = 100 * 1000 * 1000;
String[] stringArray = new String[testLength];
Arrays.fill(stringArray, "a");
List<String> stringList = new ArrayList<String>(Arrays.asList(stringArray));
{
long start = System.nanoTime();
long total = 0;
for (String str : stringArray) {
total += str.length();
}
System.out.printf("The for each Array loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
{
long start = System.nanoTime();
long total = 0;
for (int i = 0, stringListSize = stringList.size(); i < stringListSize; i++) {
String str = stringList.get(i);
total += str.length();
}
System.out.printf("The for/get List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
{
long start = System.nanoTime();
long total = 0;
for (String str : stringList) {
total += str.length();
}
System.out.printf("The for each List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
}
}
When run with one billion entries entries prints (using Java 6 update 26.)
The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000
When run with one billion entries entries prints (using OpenJDK 7.)
The for each Array loop time was 0.76 ns total=1000000000
The for/get List loop time was 0.91 ns total=1000000000
The for each List loop time was 1.04 ns total=1000000000
i.e. exactly the same. ;)
GravityBringer's number doesn't seem right, because I know ArrayList.get()
is as fast as raw array access after VM optimization.
I ran GravityBringer's test twice on my machine, -server
mode
50574847
43872295
30494292
30787885
(2nd round)
33865894
32939945
33362063
33165376
The bottleneck in such tests is actually memory read/write. Judging from the numbers, the entire 2 arrays are in my L2 cache. If we decrease the size to fit L1 cache, or if we increase the size beyond L2 cache, we'll see 10X throughput difference.
The iterator of ArrayList
uses a single int
counter. Even if VM doesn't put it in a register (the loop body is too complex), at least it will be in the L1 cache, therefore r/w of are basically free.
The ultimate answer of course is to test your particular program in your particular environment.
Though it's not helpful to play agnostic whenever a benchmark question is raised.