Fastest way to recreate the ArrayList in a for loop

前端 未结 4 699
野性不改
野性不改 2020-11-29 13:24

In Java, using the following function for a huge matrix X to print its column-distinct elements:

// create the list of distinct values
List va         


        
相关标签:
4条回答
  • 2020-11-29 13:44

    (corrected as of 2015-09-04 to include reproducible benchmark and conclusions)

    • Of course values.clear() is faster than creating a new object (just sets the last item index at zero). Almost certainly a values.clear() would be faster than creating a new object. In the case of the ArrayList you used initially, it would just set set the insertion index to zero.

    • As I commented at P.D.#1 BitSet might be a fastest approach for this case where the elements are integers (assuming the range of values is not too broad. However that might not be useful for any other type of elements.

    • Also as said by as i coincided with Assylias answer, HashSet is a better choice than ArrayList (assuming hashCode() gives a decent distribution which doesn't lead us to a O(N) performance).

      In this HashSet case, intuition would also suggest that clear() (which basically sets the HashSet#table array of "pigeon holes" to null) would be faster than building a brand new collection (which in any case requires that same table to be initialized/resseted to zeros). But in this particular case things happen the other way round. Assylias published his results. Unfortunately I have had to code my bechmark myself in order to find out how could this happen. I go over this issue in P.D.#3

      Anyway the main thing about this is that since creating a brand new HashSet for every iteration has not a substantial penalty, it makes sense to do so (since it is simpler), unless we must take bigger care about performance and resources.

    • Another issue about performance will be I/O. That System.out.println()in the sample code probably does a flush() for every line, which automatically will shift the bottleneck into the console/stdout. A workaround might be adding to a StringBuffer. Unless there is a reader process waiting avidly for that output, it might make sense to delay the write until the end of the loop.

    This would be my try:

    Set<Integer> values = new HashSet<Integer>();
    // (PD 1) Or new BitSet(max_x - min_x + 1);
    // (PD 2) Or new HashSet((int)Math.ceil(n/0.75));
    StringBuffer sb = new StringBuffer(); // appends row values for printing together.
    
    for (int j = 0, x; j < m; j++) {
        values.clear();
        sb.setLength(0);
        for (int i = 0; i < n; i++) {
             x = X[i][j];
             if (! values.contains(x)){
                 sb.append(x+"\n");
                 values.add(x);
             }
        }
        System.out.print(sb);
    }
    

    P.D.1. Also if you might consider using BitSet. It has a O(1) access performance(even in worst case, as there are no collisions). It will be best fit for integers with a range starting with 0 (otherwise it might require translation) and a population of actual values dense enough within the possible distribution.

    • For example if you checking the occurrence of Unicode codepoints you will need a 139,264 bytes long array (17 (planes) * 216 (codepoints/plane) / 8), where maybe you are using just 40 different characters in a 100 character-long text message, that might be an overkill. But if you were restricted to the 256 possible values within ISO-Latin-1. (8 bytes bitset), that would be actually be a perfect fit.

    P.D.2. Also, as Assylias says, setting an initial size for HashSet may help. As threshold = (int)(capacity * loadFactor) , you might want an initialCapacity=(int)Math.ceil(n/0.75) to be sure there is no resizing. That concern belongs to Assylias post (I didn't use it for myself) and is unproper to discuss in this manner


    P.D.3 (September 2015:3 years after) I happened to revisit this question and I was so intrigued about Assylas results I coded my own micro-benchmark (which I include, so anyone can replicate). These are my conclusions:

    • The BitSet I proposed (note: Won't be fit for non-integers and very sparsely-packed distributions) clearly outperforms all the flavors of HashSet (around 4 times faster in densely packed distributions)
    • Tests for a highly filled set with a size of 1000 show slight advantage in favor of creating new collection (7.7" vs 9.8"). However, the "dry run" of HashSet#clear() vs new HashSet() will throw opposite results (9.5" vs 7.5"). My guess is that is because a penalty of invalidating cache when resetting HashSet.table (setting null where it was not null).
    • Also it is a big advantage knowing the optimal size beforehand (which might not always be feasible). the HashSet.clear() approach is more adaptive and will tolerate much better underestimating the size. Overestimating wont mean much so much difference, but might not be a good strategy if memory is an issue.
    • The results clearly show that nowadays creating an object and allocating memory is not a big issue (See Programmers.SE). However, reusing objects should still be an option to be considered. See for example in drmirror how even after JDK 1.6 evolution, reusing instances (CharBuffer) doubles performance.
    • Also I wondered what was the impact of Assylias using a loadFactor==1.0f (HashSet wont resize until size > table.length*loadFactor, which is different from what i suggested him, but is perfect if there are no collisions). Roughly loadFactor==0.75f requires 1.33 times the table space in exchange of avoiding a 25% of collisions. My tests didn't show any advantage of of the default setting for this scenario.

    Here is the class I used for my tests. Sorry if it might be overshooting in some aspects and lacking in others (no warm up, just execute long enough so an implementation has the chance to choke on his own garbage).

    /**
     * Messing around this StackOverflow question:   https://stackoverflow.com/questions/11740013/fastest-way-to-recreate-the-arraylist-in-a-for-loop/ .
     * Quite surprisingly new HashSet() (which should imply actual memory initialization) is faster than HashSet.clear() in the given scenario.
     * Primary goal is to test this phenomenon (new vs clear) under different scenarios.
     * Secondarily a bit about the BitSet and the HashSet loadFactor is tested.
     * @author Javier
     */
    public class TestSetClear2 {
    
    public static interface MicroBenchmark {
        public String getName();
        /**
         * 
         * @param dataSet Data set to insert in the collection
         * @param initialSize Initial size for the collection. Can try to be optimal or try to fool.
         * @param iterations Number of times to go through the dataSet over and over
         */
        public void run(int[] dataSet, int initialSize, int iterations);
    }
    
    /** Bad initial case. Based in question code */
    public static class MBList implements MicroBenchmark {
        @Override public String getName() { return "ArrayList.clear()"; }
        @Override public void run(int[] data, int initialSize, int n) {
            // Not taking initial size into account may result in a resizing penalty in the first iteration
            // But will have an adequate size in following iterations, and wont be fooled by bad estimations. 
            List<Integer> values = new ArrayList<Integer>();
            for (int iter = 0; iter < n; iter++) {
                values.clear();
                for (int i = 0; i < data.length; i++) {
                    int x = data[i];
                    if (values.contains(x)) continue;
                    values.add(x);
                }
            }
        }
    }
    
    /** new HashSet(N,1) for every iteration. Reported as best by assylias. */
    public static class MBNewHashSetN1 implements MicroBenchmark {
        @Override public String getName() { return "new HashSet(N,1)"; }
        @Override public void run(int[] data, int initialSize,  int n) {
            for (int iter = 0; iter < n; iter++) {
                Set<Integer> values = new HashSet<>(initialSize, 1.0f); // 1.0 loadfactor optimal if no collisions.
                for (int i = 0; i < data.length; i++) {
                    int x = data[i];
                    if (values.contains(x)) continue;
                    values.add(x);
                }
            }
        }
    }
    
    // No need to implement raw new HashSet() (reported as worse). Will be enough fooling to initialize to 16 so it succumbs to resizing.
    
    /** HashsetClear for every iteration. Attempted by Assylias and Javier. Clear() does not perform as well as expected under basic tests. */
    public static class MBHashSetClear implements MicroBenchmark {
        private float loadFactor; // Allow loadFactor to check how much 1.0 factor affects if there are collisions.
        private String name;
        public MBHashSetClear(float loadFactor) {
            this.loadFactor = loadFactor;
            name = String.format(Locale.ENGLISH, "HashSet(N,%f).clear()", loadFactor);
        }
        @Override public String getName() { return name; }
        @Override public void run(int[] data, int initialSize, int n) {
            HashSet<Integer> values = new HashSet<>((int)Math.ceil(initialSize/loadFactor), loadFactor);// Just the size for loadfactor so it wont resize.
            for (int iter = 0; iter < n; iter++) {
                values.clear();
                for (int i = 0; i < data.length; i++) {
                    int x = data[i];
                    if (values.contains(x)) continue;
                    values.add(x);
                }
            }
        }
    }
    
    /** Javier BitSet. Might clearly outperform HashSet, but only on the very specific constraints of the test (non negative integers, not hugely big). */
    public static class MBBitSet implements MicroBenchmark {
        @Override public String getName() { return "BitSet.clear()"; }
        @Override public void run(int[] data, int distributionSize, int n) {
            BitSet values = new BitSet(distributionSize);
            for (int iter = 0; iter < n; iter++) {
                values.clear();
                for (int i = 0; i < data.length; i++) {
                    int x = data[i];
                    if (values.get(x)) continue;
                    values.set(x);
                }
            }
        }
    }
    
    public static void main(String[] args) {
        final MicroBenchmark mbNew = new MBNewHashSetN1();
        // Create with same loadFactor as MBNewHashSetN1. So we compare apples with apples (same size of handled table, same collisions).
        final MicroBenchmark mbClear = new MBHashSetClear(1.0f);
        final MicroBenchmark mbClear075 = new MBHashSetClear(0.75f);
        final MicroBenchmark mbBitset = new MBBitSet();
        final MicroBenchmark mbList = new MBList(); // Will have a taste of O(N) with a not too bit dataset.
    
        // warmup. trigger the cpu high performance mode? Fill the heap with garbage?
        //mbNew.run(dataSetE3xE3, 1000, (int)1e5); // Using new HS might give a bit advantage?
    
        int timePerTest = 10000;
        int distributionSize, initialCapacity, datasetLength;
    
        // 1000 long and values 0..999 (1e3 x 1e3). Optimal initial capacity
        distributionSize = 1000; datasetLength = 1000; initialCapacity = 1000;
        final int[] dataSetE3xE3 = generateRandomSet(1000,1000);
        runBenchmark("E3xE3", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear, mbClear075, mbBitset);
        // repeat with underestimated initial size. Will incur in resizing penalty
        initialCapacity = 16; // Default initial
        runBenchmark("E3xE3+underSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear, mbBitset);
        // repeat with overestimated initial size. larger garbage and clearing.
        initialCapacity = 100000; // oversized will force to handle large tables filled with 0 / null.
        runBenchmark("E3xE3+overSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear, mbBitset);
        // Dry run (not rum). what if we focus on the new and clear operations. Just 1 item so clear() is forced to traverse the table.
        datasetLength = 1; distributionSize = 1000; initialCapacity = 1000;
        runBenchmark("E3xE3-DryRun", generateRandomSet(datasetLength, distributionSize),
                distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear);
    
        // check for * 100 and / 100 sizes.
        distributionSize = datasetLength = initialCapacity = 10;
        runBenchmark("E1xE1", 
                generateRandomSet(datasetLength, distributionSize),
                distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear, mbList);
        distributionSize = datasetLength = initialCapacity = 100000;
        runBenchmark("E5xE5", generateRandomSet(datasetLength, distributionSize),
                distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear);
    
        // Concentrated distributions might behave as with oversized?
        datasetLength=10000; distributionSize=10; initialCapacity=Math.min(datasetLength, distributionSize);
        runBenchmark("E4xE1", 
                generateRandomSet(datasetLength, distributionSize),
                distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear);
    
        // Sparse distributions might allow mild collision. Also adverse for BitSet.
        // TODO Generate a greater/known amount of collisions
        datasetLength=10000; distributionSize=(int)1e6; initialCapacity=Math.min(datasetLength, distributionSize);
        runBenchmark("E4xE6", 
                generateRandomSet(datasetLength, distributionSize),
                distributionSize, timePerTest, initialCapacity,
                mbNew, mbClear, mbClear075);
    
    }
    
    private static void runBenchmark(String testName, int[] dataSet, int distributionSize, int timePerTest
            , int initialCapacity, MicroBenchmark ... testees /* not testes */) {
        // How many iterations? Will use first testee to callibrate.
        MicroBenchmark curTest = testees[0];
        long t0 = System.nanoTime();
        long ellapsed = 0L;
        final long minToCallibrate = (long)0.5e9; // half second
        int iterations = 1;
        while (ellapsed < minToCallibrate) {
            curTest.run(dataSet, initialCapacity, iterations);
    
            iterations *= 2; // same as <<= 1
            ellapsed = System.nanoTime() - t0;
        }
        // calculation is not laser-sharp precise (actually executed iterations -1, and some extra initializations).
        final int nIterations = (int) ((double)iterations * timePerTest  * 1e6 /* nanos/millis */ / ellapsed);
    
        // Do actual benchmark
        System.out.printf(Locale.ENGLISH, "dataset:{name=%s,length:%d,distrib:%d,capacity0:%d,iterations:%d}\n",
                testName, dataSet.length, distributionSize, initialCapacity, nIterations);
    
        for (MicroBenchmark testee : testees) {
            t0 = System.nanoTime();
            testee.run(dataSet, initialCapacity, nIterations);
            ellapsed = System.nanoTime() - t0;
            System.out.printf(Locale.ENGLISH, "%s : %5.3f\n", testee.getName(), ellapsed/1e9 );
    
        }
    
    }
    
    private static int[] generateRandomSet(int lengthOfSet, int distributionSize) {
        Random r = new Random();
        int[] result = new int[lengthOfSet];
        for (int i = 0; i < lengthOfSet; i++) {
            result[i] = r.nextInt(distributionSize);
        }
        return result;
    }
    }
    

    Here are my results (using JDK 1.8.0_31 - 64 bits - Windows 7 )

    dataset:{name=E3xE3,length:1000,distrib:1000,capacity0:1000,iterations:514241}
    new HashSet(N,1) : 7.688
    HashSet(N,1.000000).clear() : 9.796
    HashSet(N,0.750000).clear() : 9.923
    BitSet.clear() : 1.990
    dataset:{name=E3xE3+underSize,length:1000,distrib:1000,capacity0:16,iterations:420572}
    new HashSet(N,1) : 9.735
    HashSet(N,1.000000).clear() : 6.637
    BitSet.clear() : 1.611
    dataset:{name=E3xE3+overSize,length:1000,distrib:1000,capacity0:100000,iterations:143032}
    new HashSet(N,1) : 9.948
    HashSet(N,1.000000).clear() : 10.377
    BitSet.clear() : 0.447
    dataset:{name=E3xE3-DryRun,length:1,distrib:1000,capacity0:1000,iterations:18511486}
    new HashSet(N,1) : 9.583
    HashSet(N,1.000000).clear() : 7.523
    dataset:{name=E1xE1,length:10,distrib:10,capacity0:10,iterations:76177852}
    new HashSet(N,1) : 9.988
    HashSet(N,1.000000).clear() : 10.521
    ArrayList.clear() : 7.915
    dataset:{name=E5xE5,length:100000,distrib:100000,capacity0:100000,iterations:2892}
    new HashSet(N,1) : 9.764
    HashSet(N,1.000000).clear() : 9.615
    dataset:{name=E4xE1,length:10000,distrib:10,capacity0:10,iterations:170386}
    new HashSet(N,1) : 9.843
    HashSet(N,1.000000).clear() : 9.708
    dataset:{name=E4xE6,length:10000,distrib:1000000,capacity0:10000,iterations:36497}
    new HashSet(N,1) : 9.686
    HashSet(N,1.000000).clear() : 10.079
    HashSet(N,0.750000).clear() : 10.008
    
    0 讨论(0)
  • 2020-11-29 13:47

    What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.

    As for your specific question, I would just create a new Set at each loop - object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom - see the most efficient version in EDIT 2):

    for (int j = 0, x; j < m; j++) {
        Set<Integer> values = new HashSet<Integer>();
        for (int i = 0; i < n; i++) {
            x = X[i][j];
            if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
            System.out.println(x);
        }
    }
    

    However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.

    EDIT

    I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:

    Set<Integer> values = new HashSet<Integer>();
    for (int j = 0, x; j < m; j++) {
        for (int i = 0; i < n; i++) {
            x = X[i][j];
            if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
            System.out.println(x);
        }
        values.clear();
    }
    

    EDIT 2

    An actually even faster version of the code is obtained by creating a new set of the right size at each loop:

    for (int j = 0, x; j < m; j++) {
        Set<Integer> values = new HashSet<Integer>(n, 1); //right size from the beginning
        for (int i = 0; i < n; i++) {
            x = X[i][j];
            if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
            System.out.println(x);
        }
    }
    

    Summary of result

    After JVM warm up + JIT:

    Set<Integer> values = new HashSet<Integer>(n, 1); =====> 280 ms
    values.clear();                                   =====> 380 ms
    Set<Integer> values = new HashSet<Integer>();     =====> 450 ms 
    
    0 讨论(0)
  • 2020-11-29 14:02

    You should use .clear() methods , using this you haven't need to assign memory your variable again and again.

    0 讨论(0)
  • 2020-11-29 14:08

    You can use ArrayList.clear(); this keep address ArrayList in memory, without garbage collector effect on this address.

    0 讨论(0)
提交回复
热议问题