Enhanced for loop performance worse than traditional indexed lookup?

后端 未结 4 1436
小蘑菇
小蘑菇 2020-12-16 01:38

I just came across this seemingly innocuous comment, benchmarking ArrayList vs a raw String array. It\'s from a couple years ago, but the OP writes

I

相关标签:
4条回答
  • 2020-12-16 02:09

    Every claim that X is slower than Y on a JVM which does not address all the issues presented in this article ant it's second part spreads fears and lies about the performance of a typical JVM. This applies to the comment referred to by the original question as well as to GravityBringer's answer. I am sorry to be so rude, but unless you use appropriate micro benchmarking technology your benchmarks produce really badly skewed random numbers.

    Tell me if you're interested in more explanations. Although it is all in the articles I referred to.

    0 讨论(0)
  • 2020-12-16 02:09

    The situation has gotten worse for ArrayLists. On my computer running Java 6.26, there is a fourfold difference. Interestingly (and perhaps quite logically), there is no difference for raw arrays. I ran the following test:

        int testSize = 5000000;
    
        ArrayList<Double> list = new ArrayList<Double>();
        Double[] arr = new Double[testSize];
    
        //set up the data - make sure data doesn't have patterns
        //or anything compiler could somehow optimize
        for (int i=0;i<testSize; i++)
        {
            double someNumber = Math.random();
            list.add(someNumber);
            arr[i] = someNumber;
        }
    
        //ArrayList foreach
        long time = System.nanoTime();
        double total1 = 0;
        for (Double k: list)
        {
            total1 += k;
        }
        System.out.println (System.nanoTime()-time);
    
        //ArrayList get() method
        time = System.nanoTime();
        double total2 = 0;
        for (int i=0;i<testSize;i++)
        {
            total2 += list.get(i);  
        }
        System.out.println (System.nanoTime()-time);        
    
        //array foreach
        time = System.nanoTime();
        double total3 = 0;
        for (Double k: arr)
        {
            total3 += k;
        }
        System.out.println (System.nanoTime()-time);
    
        //array indexing
        time = System.nanoTime();
        double total4 = 0;
        for (int i=0;i<testSize;i++)
        {
            total4 += arr[i];
        }
        System.out.println (System.nanoTime()-time);
    
        //would be strange if different values were produced,
        //but no, all these are the same, of course
        System.out.println (total1);
        System.out.println (total2);        
        System.out.println (total3);
        System.out.println (total4);
    

    The arithmetic in the loops is to prevent the JIT compiler from possibly optimizing away some of the code. The effect of the arithmetic on performance is small, as the runtime is dominated by the ArrayList accesses.

    The runtimes are (in nanoseconds):

    ArrayList foreach: 248,351,782

    ArrayList get(): 60,657,907

    array foreach: 27,381,576

    array direct indexing: 27,468,091

    0 讨论(0)
  • 2020-12-16 02:14

    The problem you have is that using an Iterator will be slower than using a direct lookup. On my machine the difference is about 0.13 ns per iteration. Using an array instead saves about 0.15 ns per iteration. This should be trivial in 99% of situations.

    public static void main(String... args) {
        int testLength = 100 * 1000 * 1000;
        String[] stringArray = new String[testLength];
        Arrays.fill(stringArray, "a");
        List<String> stringList = new ArrayList<String>(Arrays.asList(stringArray));
        {
            long start = System.nanoTime();
            long total = 0;
            for (String str : stringArray) {
                total += str.length();
            }
            System.out.printf("The for each Array loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
        }
        {
            long start = System.nanoTime();
            long total = 0;
            for (int i = 0, stringListSize = stringList.size(); i < stringListSize; i++) {
                String str = stringList.get(i);
                total += str.length();
            }
            System.out.printf("The for/get List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
        }
        {
            long start = System.nanoTime();
            long total = 0;
            for (String str : stringList) {
                total += str.length();
            }
            System.out.printf("The for each List loop time was %.2f ns total=%d%n", (double) (System.nanoTime() - start) / testLength, total);
        }
    }
    

    When run with one billion entries entries prints (using Java 6 update 26.)

    The for each Array loop time was 0.76 ns total=1000000000
    The for/get List loop time was 0.91 ns total=1000000000
    The for each List loop time was 1.04 ns total=1000000000
    

    When run with one billion entries entries prints (using OpenJDK 7.)

    The for each Array loop time was 0.76 ns total=1000000000
    The for/get List loop time was 0.91 ns total=1000000000
    The for each List loop time was 1.04 ns total=1000000000
    

    i.e. exactly the same. ;)

    0 讨论(0)
  • 2020-12-16 02:20

    GravityBringer's number doesn't seem right, because I know ArrayList.get() is as fast as raw array access after VM optimization.

    I ran GravityBringer's test twice on my machine, -server mode

    50574847
    43872295
    30494292
    30787885
    (2nd round)
    33865894
    32939945
    33362063
    33165376
    

    The bottleneck in such tests is actually memory read/write. Judging from the numbers, the entire 2 arrays are in my L2 cache. If we decrease the size to fit L1 cache, or if we increase the size beyond L2 cache, we'll see 10X throughput difference.

    The iterator of ArrayList uses a single int counter. Even if VM doesn't put it in a register (the loop body is too complex), at least it will be in the L1 cache, therefore r/w of are basically free.

    The ultimate answer of course is to test your particular program in your particular environment.

    Though it's not helpful to play agnostic whenever a benchmark question is raised.

    0 讨论(0)
提交回复
热议问题