In Java, using the following function for a huge matrix X to print its column-distinct elements:
// create the list of distinct values
List va
(corrected as of 2015-09-04 to include reproducible benchmark and conclusions)
Of course
Almost certainly a values.clear()
is faster than creating a new object (just sets the last item index at zero).values.clear()
would be faster than creating a new object. In the case of the ArrayList
you used initially, it would just set set the insertion index to zero.
As I commented at P.D.#1 BitSet might be a fastest approach for this case where the elements are integers (assuming the range of values is not too broad. However that might not be useful for any other type of elements.
Also as said by as i coincided with Assylias answer, HashSet is a better choice than ArrayList (assuming hashCode()
gives a decent distribution which doesn't lead us to a O(N) performance).
In this HashSet
case, intuition would also suggest that clear()
(which basically sets the HashSet#table
array of "pigeon holes" to null) would be faster than building a brand new collection (which in any case requires that same table to be initialized/resseted to zeros). But in this particular case things happen the other way round. Assylias published his results. Unfortunately I have had to code my bechmark myself in order to find out how could this happen. I go over this issue in P.D.#3
Anyway the main thing about this is that since creating a brand new HashSet for every iteration has not a substantial penalty, it makes sense to do so (since it is simpler), unless we must take bigger care about performance and resources.
Another issue about performance will be I/O. That System.out.println()
in the sample code probably does a flush()
for every line, which automatically will shift the bottleneck into the console/stdout. A workaround might be adding to a StringBuffer
. Unless there is a reader process waiting avidly for that output, it might make sense to delay the write until the end of the loop.
This would be my try:
Set<Integer> values = new HashSet<Integer>();
// (PD 1) Or new BitSet(max_x - min_x + 1);
// (PD 2) Or new HashSet((int)Math.ceil(n/0.75));
StringBuffer sb = new StringBuffer(); // appends row values for printing together.
for (int j = 0, x; j < m; j++) {
values.clear();
sb.setLength(0);
for (int i = 0; i < n; i++) {
x = X[i][j];
if (! values.contains(x)){
sb.append(x+"\n");
values.add(x);
}
}
System.out.print(sb);
}
P.D.1. Also if you might consider using BitSet. It has a O(1) access performance(even in worst case, as there are no collisions). It will be best fit for integers with a range starting with 0 (otherwise it might require translation) and a population of actual values dense enough within the possible distribution.
P.D.2. Also, as Assylias says, setting an initial size for HashSet may help. As That concern belongs to Assylias post (I didn't use it for myself) and is unproper to discuss in this mannerthreshold = (int)(capacity * loadFactor)
, you might want an initialCapacity=(int)Math.ceil(n/0.75)
to be sure there is no resizing.
P.D.3 (September 2015:3 years after) I happened to revisit this question and I was so intrigued about Assylas results I coded my own micro-benchmark (which I include, so anyone can replicate). These are my conclusions:
BitSet
I proposed (note: Won't be fit for non-integers and very sparsely-packed distributions) clearly outperforms all the flavors of HashSet (around 4 times faster in densely packed distributions)HashSet#clear()
vs new HashSet()
will throw opposite results (9.5" vs 7.5"). My guess is that is because a penalty of invalidating cache when resetting HashSet.table
(setting null
where it was not null
).HashSet.clear()
approach is more adaptive and will tolerate much better underestimating the size. Overestimating wont mean much so much difference, but might not be a good strategy if memory is an issue.loadFactor==1.0f
(HashSet wont resize until size > table.length*loadFactor
, which is different from what i suggested him, but is perfect if there are no collisions). Roughly loadFactor==0.75f
requires 1.33 times the table space in exchange of avoiding a 25% of collisions. My tests didn't show any advantage of of the default setting for this scenario. Here is the class I used for my tests. Sorry if it might be overshooting in some aspects and lacking in others (no warm up, just execute long enough so an implementation has the chance to choke on his own garbage).
/**
* Messing around this StackOverflow question: https://stackoverflow.com/questions/11740013/fastest-way-to-recreate-the-arraylist-in-a-for-loop/ .
* Quite surprisingly new HashSet() (which should imply actual memory initialization) is faster than HashSet.clear() in the given scenario.
* Primary goal is to test this phenomenon (new vs clear) under different scenarios.
* Secondarily a bit about the BitSet and the HashSet loadFactor is tested.
* @author Javier
*/
public class TestSetClear2 {
public static interface MicroBenchmark {
public String getName();
/**
*
* @param dataSet Data set to insert in the collection
* @param initialSize Initial size for the collection. Can try to be optimal or try to fool.
* @param iterations Number of times to go through the dataSet over and over
*/
public void run(int[] dataSet, int initialSize, int iterations);
}
/** Bad initial case. Based in question code */
public static class MBList implements MicroBenchmark {
@Override public String getName() { return "ArrayList.clear()"; }
@Override public void run(int[] data, int initialSize, int n) {
// Not taking initial size into account may result in a resizing penalty in the first iteration
// But will have an adequate size in following iterations, and wont be fooled by bad estimations.
List<Integer> values = new ArrayList<Integer>();
for (int iter = 0; iter < n; iter++) {
values.clear();
for (int i = 0; i < data.length; i++) {
int x = data[i];
if (values.contains(x)) continue;
values.add(x);
}
}
}
}
/** new HashSet(N,1) for every iteration. Reported as best by assylias. */
public static class MBNewHashSetN1 implements MicroBenchmark {
@Override public String getName() { return "new HashSet(N,1)"; }
@Override public void run(int[] data, int initialSize, int n) {
for (int iter = 0; iter < n; iter++) {
Set<Integer> values = new HashSet<>(initialSize, 1.0f); // 1.0 loadfactor optimal if no collisions.
for (int i = 0; i < data.length; i++) {
int x = data[i];
if (values.contains(x)) continue;
values.add(x);
}
}
}
}
// No need to implement raw new HashSet() (reported as worse). Will be enough fooling to initialize to 16 so it succumbs to resizing.
/** HashsetClear for every iteration. Attempted by Assylias and Javier. Clear() does not perform as well as expected under basic tests. */
public static class MBHashSetClear implements MicroBenchmark {
private float loadFactor; // Allow loadFactor to check how much 1.0 factor affects if there are collisions.
private String name;
public MBHashSetClear(float loadFactor) {
this.loadFactor = loadFactor;
name = String.format(Locale.ENGLISH, "HashSet(N,%f).clear()", loadFactor);
}
@Override public String getName() { return name; }
@Override public void run(int[] data, int initialSize, int n) {
HashSet<Integer> values = new HashSet<>((int)Math.ceil(initialSize/loadFactor), loadFactor);// Just the size for loadfactor so it wont resize.
for (int iter = 0; iter < n; iter++) {
values.clear();
for (int i = 0; i < data.length; i++) {
int x = data[i];
if (values.contains(x)) continue;
values.add(x);
}
}
}
}
/** Javier BitSet. Might clearly outperform HashSet, but only on the very specific constraints of the test (non negative integers, not hugely big). */
public static class MBBitSet implements MicroBenchmark {
@Override public String getName() { return "BitSet.clear()"; }
@Override public void run(int[] data, int distributionSize, int n) {
BitSet values = new BitSet(distributionSize);
for (int iter = 0; iter < n; iter++) {
values.clear();
for (int i = 0; i < data.length; i++) {
int x = data[i];
if (values.get(x)) continue;
values.set(x);
}
}
}
}
public static void main(String[] args) {
final MicroBenchmark mbNew = new MBNewHashSetN1();
// Create with same loadFactor as MBNewHashSetN1. So we compare apples with apples (same size of handled table, same collisions).
final MicroBenchmark mbClear = new MBHashSetClear(1.0f);
final MicroBenchmark mbClear075 = new MBHashSetClear(0.75f);
final MicroBenchmark mbBitset = new MBBitSet();
final MicroBenchmark mbList = new MBList(); // Will have a taste of O(N) with a not too bit dataset.
// warmup. trigger the cpu high performance mode? Fill the heap with garbage?
//mbNew.run(dataSetE3xE3, 1000, (int)1e5); // Using new HS might give a bit advantage?
int timePerTest = 10000;
int distributionSize, initialCapacity, datasetLength;
// 1000 long and values 0..999 (1e3 x 1e3). Optimal initial capacity
distributionSize = 1000; datasetLength = 1000; initialCapacity = 1000;
final int[] dataSetE3xE3 = generateRandomSet(1000,1000);
runBenchmark("E3xE3", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
mbNew, mbClear, mbClear075, mbBitset);
// repeat with underestimated initial size. Will incur in resizing penalty
initialCapacity = 16; // Default initial
runBenchmark("E3xE3+underSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
mbNew, mbClear, mbBitset);
// repeat with overestimated initial size. larger garbage and clearing.
initialCapacity = 100000; // oversized will force to handle large tables filled with 0 / null.
runBenchmark("E3xE3+overSize", dataSetE3xE3, distributionSize, timePerTest, initialCapacity,
mbNew, mbClear, mbBitset);
// Dry run (not rum). what if we focus on the new and clear operations. Just 1 item so clear() is forced to traverse the table.
datasetLength = 1; distributionSize = 1000; initialCapacity = 1000;
runBenchmark("E3xE3-DryRun", generateRandomSet(datasetLength, distributionSize),
distributionSize, timePerTest, initialCapacity,
mbNew, mbClear);
// check for * 100 and / 100 sizes.
distributionSize = datasetLength = initialCapacity = 10;
runBenchmark("E1xE1",
generateRandomSet(datasetLength, distributionSize),
distributionSize, timePerTest, initialCapacity,
mbNew, mbClear, mbList);
distributionSize = datasetLength = initialCapacity = 100000;
runBenchmark("E5xE5", generateRandomSet(datasetLength, distributionSize),
distributionSize, timePerTest, initialCapacity,
mbNew, mbClear);
// Concentrated distributions might behave as with oversized?
datasetLength=10000; distributionSize=10; initialCapacity=Math.min(datasetLength, distributionSize);
runBenchmark("E4xE1",
generateRandomSet(datasetLength, distributionSize),
distributionSize, timePerTest, initialCapacity,
mbNew, mbClear);
// Sparse distributions might allow mild collision. Also adverse for BitSet.
// TODO Generate a greater/known amount of collisions
datasetLength=10000; distributionSize=(int)1e6; initialCapacity=Math.min(datasetLength, distributionSize);
runBenchmark("E4xE6",
generateRandomSet(datasetLength, distributionSize),
distributionSize, timePerTest, initialCapacity,
mbNew, mbClear, mbClear075);
}
private static void runBenchmark(String testName, int[] dataSet, int distributionSize, int timePerTest
, int initialCapacity, MicroBenchmark ... testees /* not testes */) {
// How many iterations? Will use first testee to callibrate.
MicroBenchmark curTest = testees[0];
long t0 = System.nanoTime();
long ellapsed = 0L;
final long minToCallibrate = (long)0.5e9; // half second
int iterations = 1;
while (ellapsed < minToCallibrate) {
curTest.run(dataSet, initialCapacity, iterations);
iterations *= 2; // same as <<= 1
ellapsed = System.nanoTime() - t0;
}
// calculation is not laser-sharp precise (actually executed iterations -1, and some extra initializations).
final int nIterations = (int) ((double)iterations * timePerTest * 1e6 /* nanos/millis */ / ellapsed);
// Do actual benchmark
System.out.printf(Locale.ENGLISH, "dataset:{name=%s,length:%d,distrib:%d,capacity0:%d,iterations:%d}\n",
testName, dataSet.length, distributionSize, initialCapacity, nIterations);
for (MicroBenchmark testee : testees) {
t0 = System.nanoTime();
testee.run(dataSet, initialCapacity, nIterations);
ellapsed = System.nanoTime() - t0;
System.out.printf(Locale.ENGLISH, "%s : %5.3f\n", testee.getName(), ellapsed/1e9 );
}
}
private static int[] generateRandomSet(int lengthOfSet, int distributionSize) {
Random r = new Random();
int[] result = new int[lengthOfSet];
for (int i = 0; i < lengthOfSet; i++) {
result[i] = r.nextInt(distributionSize);
}
return result;
}
}
Here are my results (using JDK 1.8.0_31 - 64 bits - Windows 7 )
dataset:{name=E3xE3,length:1000,distrib:1000,capacity0:1000,iterations:514241}
new HashSet(N,1) : 7.688
HashSet(N,1.000000).clear() : 9.796
HashSet(N,0.750000).clear() : 9.923
BitSet.clear() : 1.990
dataset:{name=E3xE3+underSize,length:1000,distrib:1000,capacity0:16,iterations:420572}
new HashSet(N,1) : 9.735
HashSet(N,1.000000).clear() : 6.637
BitSet.clear() : 1.611
dataset:{name=E3xE3+overSize,length:1000,distrib:1000,capacity0:100000,iterations:143032}
new HashSet(N,1) : 9.948
HashSet(N,1.000000).clear() : 10.377
BitSet.clear() : 0.447
dataset:{name=E3xE3-DryRun,length:1,distrib:1000,capacity0:1000,iterations:18511486}
new HashSet(N,1) : 9.583
HashSet(N,1.000000).clear() : 7.523
dataset:{name=E1xE1,length:10,distrib:10,capacity0:10,iterations:76177852}
new HashSet(N,1) : 9.988
HashSet(N,1.000000).clear() : 10.521
ArrayList.clear() : 7.915
dataset:{name=E5xE5,length:100000,distrib:100000,capacity0:100000,iterations:2892}
new HashSet(N,1) : 9.764
HashSet(N,1.000000).clear() : 9.615
dataset:{name=E4xE1,length:10000,distrib:10,capacity0:10,iterations:170386}
new HashSet(N,1) : 9.843
HashSet(N,1.000000).clear() : 9.708
dataset:{name=E4xE6,length:10000,distrib:1000000,capacity0:10000,iterations:36497}
new HashSet(N,1) : 9.686
HashSet(N,1.000000).clear() : 10.079
HashSet(N,0.750000).clear() : 10.008
What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.
As for your specific question, I would just create a new Set at each loop - object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom - see the most efficient version in EDIT 2):
for (int j = 0, x; j < m; j++) {
Set<Integer> values = new HashSet<Integer>();
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
}
However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.
EDIT
I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:
Set<Integer> values = new HashSet<Integer>();
for (int j = 0, x; j < m; j++) {
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
values.clear();
}
EDIT 2
An actually even faster version of the code is obtained by creating a new set of the right size at each loop:
for (int j = 0, x; j < m; j++) {
Set<Integer> values = new HashSet<Integer>(n, 1); //right size from the beginning
for (int i = 0; i < n; i++) {
x = X[i][j];
if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
System.out.println(x);
}
}
Summary of result
After JVM warm up + JIT:
Set<Integer> values = new HashSet<Integer>(n, 1); =====> 280 ms
values.clear(); =====> 380 ms
Set<Integer> values = new HashSet<Integer>(); =====> 450 ms
You should use .clear() methods , using this you haven't need to assign memory your variable again and again.
You can use ArrayList.clear();
this keep address ArrayList in memory, without garbage collector effect on this address.