I\'ve got a List
of Persons and I want to find duplicate entries, consindering all fields except id
. So using the equals()
-method (and in
As @LuiggiMendoza suggested in the comments:
You could create a custom Comparator
class that compares two Person
objects for equality, ignoring their ids.
class PersonComparator implements Comparator {
// wraps the compareTo method to compare two Strings but also accounts for NPE
int compareStrings(String a, String b) {
if(a == b) { // both strings are the same string or are null
return 0;
} else if(a == null) { // first string is null, result is negative
return -1;
} else if(b == null){ // second string is null, result is positive
return 1;
} else { // no strings are null, return the result of compareTo
return a.compareTo(b);
}
}
@Override
public int compare(Person p1, Person p2) {
// comparisons on Person objects themselves
if(p1 == p2) { // Person 1 and Person 2 are the same Person object
return 0;
}
if(p1 == null && p2 != null) { // Person 1 is null and Person 2 is not, result is negative
return -1;
}
if(p1 != null && p2 == null) { // Person 1 is not null and Person 2 is, result is positive
return 1;
}
int result = 0;
// comparisons on the attributes of the Persons objects
result = compareStrings(p1.firstname, p2.firstname);
if(result != 0) { // Persons differ in first names, we can return the result
return result;
}
result = compareStrings(p1.lastname, p2.lastname);
if(result != 0) { // Persons differ in last names, we can return the result
return result;
}
return Integer.compare(p1.age, p2.age); // if both first name and last names are equal, the comparison difference is in their age
}
}
Now you can use the TreeSet
structure with this custom Comparator
and, for example, make a simple method that eliminates the duplicate values.
List getListWithoutDups(List list) {
List newList = new ArrayList();
TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here
// foreach Person in the list
for(Person person : list) {
// if the person isn't already in the set (meaning it's not a duplicate)
// add it to the set and the new list
if(!set.contains(person)) {
set.add(person);
newList.add(person);
}
// otherwise it's a duplicate so we don't do anything
}
return newList;
}
The contains
operation in the TreeSet
, as the documentation says, "provides guaranteed log(n) time cost".
The method I suggested above take O(n*log(n))
time since we are performing the contains
operation on each list element but it also uses O(n)
space for creating a new list and the TreeSet
.
If your list is quite large (space is quite important) but you processing speed isn't an issue, then instead of adding each non-duplicate to the list, you could remove each duplicate that is found:
List getListWithoutDups(List list) {
TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here
Person person;
// for every Person in the list
for(int i = 0; i < list.size(); i++) {
person = list.get(i);
// if the person is already in the set (meaning it is a duplicate)
// remove it from the list
if(set.contains(person) {
list.remove(i);
i--; // make sure to accommodate for the list shifting after removal
}
// otherwise add it to the set of non-duplicates
else {
set.add(person);
}
}
return list;
}
Since each remove
operation on a list takes O(n)
time (because the list gets shifted each time an element is deleted), and each contains
operation takes log(n)
time, this approach would be O(n^2 log(n))
in time.
However, the space complexity would be halved since we only create the TreeSet
and not the second list.