Finding duplicates in a List ignoring a field

前端未结

关注

 4  683

我寻月下人不归 2021-01-22 12:38

I\'ve got a List of Persons and I want to find duplicate entries, consindering all fields except id. So using the equals()-method (and in

4条回答

有刺的猬 (楼主)

2021-01-22 13:20

As @LuiggiMendoza suggested in the comments:

You could create a custom Comparator class that compares two Person objects for equality, ignoring their ids.

class PersonComparator implements Comparator {

    // wraps the compareTo method to compare two Strings but also accounts for NPE
    int compareStrings(String a, String b) {
        if(a == b) {           // both strings are the same string or are null
          return 0;
        } else if(a == null) { // first string is null, result is negative
            return -1;
        } else if(b == null){  // second string is null, result is positive
            return 1;
        } else {               // no strings are null, return the result of compareTo
            return a.compareTo(b);
        }
    }

    @Override
    public int compare(Person p1, Person p2) {

        // comparisons on Person objects themselves
        if(p1 == p2) {                 // Person 1 and Person 2 are the same Person object
            return 0;
        }
        if(p1 == null && p2 != null) { // Person 1 is null and Person 2 is not, result is negative
            return -1;
        }
        if(p1 != null && p2 == null) { // Person 1 is not null and Person 2 is, result is positive
            return 1;
        }

        int result = 0;

        // comparisons on the attributes of the Persons objects
        result = compareStrings(p1.firstname, p2.firstname);
        if(result != 0) {   // Persons differ in first names, we can return the result
            return result;
        }
        result = compareStrings(p1.lastname, p2.lastname);
        if(result != 0) {  // Persons differ in last names, we can return the result
            return result;
        }

        return Integer.compare(p1.age, p2.age); // if both first name and last names are equal, the comparison difference is in their age
    }
}

Now you can use the TreeSet structure with this custom Comparator and, for example, make a simple method that eliminates the duplicate values.

List getListWithoutDups(List list) {
    List newList = new ArrayList();
    TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here

    // foreach Person in the list
    for(Person person : list) {
        // if the person isn't already in the set (meaning it's not a duplicate)
        // add it to the set and the new list
        if(!set.contains(person)) {
            set.add(person);
            newList.add(person);
        }
        // otherwise it's a duplicate so we don't do anything
    }

    return newList;
}

The contains operation in the TreeSet, as the documentation says, "provides guaranteed log(n) time cost".

The method I suggested above take O(n*log(n)) time since we are performing the contains operation on each list element but it also uses O(n) space for creating a new list and the TreeSet.

If your list is quite large (space is quite important) but you processing speed isn't an issue, then instead of adding each non-duplicate to the list, you could remove each duplicate that is found:

 List getListWithoutDups(List list) {
    TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here
    Person person;
    // for every Person in the list
    for(int i = 0; i < list.size(); i++) {
        person = list.get(i);
        // if the person is already in the set (meaning it is a duplicate)
        // remove it from the list
        if(set.contains(person) { 
            list.remove(i);
            i--; // make sure to accommodate for the list shifting after removal
        } 
        // otherwise add it to the set of non-duplicates
        else {
            set.add(person);
        }
    }
    return list;
}

Since each remove operation on a list takes O(n) time (because the list gets shifted each time an element is deleted), and each contains operation takes log(n) time, this approach would be O(n^2 log(n)) in time.

However, the space complexity would be halved since we only create the TreeSet and not the second list.

0 讨论(0)

查看其它4个回答