When processing large amounts of data I often find myself doing the following:
HashSet set = new HashSet ();
//Adding elements to
I've made a test so please check the result:
For SAME STRING items in a HashSet, TreeSet, ArrayList and LinkedList, here are the results for
Based on above results, there is NOT a BIG difference of using array list vs set. Perhaps you can try to modify this code and replace the String with your Object and see the differences then...
public static void main(String[] args) {
Set<String> hashSet = new HashSet<>();
Set<String> treeSet = new TreeSet<>();
List<String> arrayList = new ArrayList<>();
List<String> linkedList = new LinkedList<>();
List<String> base = new ArrayList<>();
for(int i = 0; i<5000000; i++){
if(i%100000==0) System.out.print(".");
base.add(UUID.randomUUID().toString());
}
System.out.println("\nBase size : " + base.size());
String item = base.get(25000);
System.out.println("SEARCHED ITEM : " + item);
hashSet.addAll(base);
treeSet.addAll(base);
arrayList.addAll(base);
linkedList.addAll(base);
long ms = System.currentTimeMillis();
System.out.println("hashSet.contains(item) ? " + (hashSet.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
System.out.println("treeSet.contains(item) ? " + (treeSet.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
System.out.println("arrayList.contains(item) ? " + (arrayList.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
System.out.println("linkedList.contains(item) ? " + (linkedList.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
}
The set will give much better performance (O(n)
vs O(n^2)
for the list), and that's normal because set membership (the contains
operation) is the very purpose of a set.
Contains for a HashSet
is O(1)
compared to O(n)
for a list, therefore you should never use a list if you often need to run contains
.
If you don't need a list, I would just use a Set and this is the natural collection to use if order doesn't matter and you want to ignore duplicates.
You can do both is you need a List without duplicates.
private Set<String> set = new HashSet<>();
private List<String> list = new ArrayList<>();
public void add(String str) {
if (set.add(str))
list.add(str);
}
This way the list will only contain unique values, the original insertion order is preserved and the operation is O(1).
The ArrayList
uses an array for storing the data. The ArrayList.contains
will be of O(n) complexity. So essentially searching in array again and again will have O(n^2)
complexity.
While HashSet
uses hashing mechanism for storing the elements into their respective buckets. The operation of HashSet
will be faster for long list of values. It will reach the element in O(1)
.
You could add elements to the list itself. Then, to dedup -
HashSet<String> hs = new HashSet<>(); // new hashset
hs.addAll(list); // add all list elements to hashset (this is the dedup, since addAll works as a union, thus removing all duplicates)
list.clear(); // clear the list
list.addAll(hs); // add all hashset elements to the list
If you just need a set with dedup, you can also use the addAll() on a different set, so that it will only have unique values.