Finding duplicates in a List ignoring a field

前端 未结 4 677
我寻月下人不归
我寻月下人不归 2021-01-22 12:38

I\'ve got a List of Persons and I want to find duplicate entries, consindering all fields except id. So using the equals()-method (and in

4条回答
  •  有刺的猬
    2021-01-22 13:20

    As @LuiggiMendoza suggested in the comments:

    You could create a custom Comparator class that compares two Person objects for equality, ignoring their ids.

    class PersonComparator implements Comparator {
    
        // wraps the compareTo method to compare two Strings but also accounts for NPE
        int compareStrings(String a, String b) {
            if(a == b) {           // both strings are the same string or are null
              return 0;
            } else if(a == null) { // first string is null, result is negative
                return -1;
            } else if(b == null){  // second string is null, result is positive
                return 1;
            } else {               // no strings are null, return the result of compareTo
                return a.compareTo(b);
            }
        }
    
        @Override
        public int compare(Person p1, Person p2) {
    
            // comparisons on Person objects themselves
            if(p1 == p2) {                 // Person 1 and Person 2 are the same Person object
                return 0;
            }
            if(p1 == null && p2 != null) { // Person 1 is null and Person 2 is not, result is negative
                return -1;
            }
            if(p1 != null && p2 == null) { // Person 1 is not null and Person 2 is, result is positive
                return 1;
            }
    
            int result = 0;
    
            // comparisons on the attributes of the Persons objects
            result = compareStrings(p1.firstname, p2.firstname);
            if(result != 0) {   // Persons differ in first names, we can return the result
                return result;
            }
            result = compareStrings(p1.lastname, p2.lastname);
            if(result != 0) {  // Persons differ in last names, we can return the result
                return result;
            }
    
            return Integer.compare(p1.age, p2.age); // if both first name and last names are equal, the comparison difference is in their age
        }
    }
    

    Now you can use the TreeSet structure with this custom Comparator and, for example, make a simple method that eliminates the duplicate values.

    List getListWithoutDups(List list) {
        List newList = new ArrayList();
        TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here
    
        // foreach Person in the list
        for(Person person : list) {
            // if the person isn't already in the set (meaning it's not a duplicate)
            // add it to the set and the new list
            if(!set.contains(person)) {
                set.add(person);
                newList.add(person);
            }
            // otherwise it's a duplicate so we don't do anything
        }
    
        return newList;
    }
    

    The contains operation in the TreeSet, as the documentation says, "provides guaranteed log(n) time cost".

    The method I suggested above take O(n*log(n)) time since we are performing the contains operation on each list element but it also uses O(n) space for creating a new list and the TreeSet.

    If your list is quite large (space is quite important) but you processing speed isn't an issue, then instead of adding each non-duplicate to the list, you could remove each duplicate that is found:

     List getListWithoutDups(List list) {
        TreeSet set = new TreeSet(new PersonComparator()); // use custom Comparator here
        Person person;
        // for every Person in the list
        for(int i = 0; i < list.size(); i++) {
            person = list.get(i);
            // if the person is already in the set (meaning it is a duplicate)
            // remove it from the list
            if(set.contains(person) { 
                list.remove(i);
                i--; // make sure to accommodate for the list shifting after removal
            } 
            // otherwise add it to the set of non-duplicates
            else {
                set.add(person);
            }
        }
        return list;
    }
    

    Since each remove operation on a list takes O(n) time (because the list gets shifted each time an element is deleted), and each contains operation takes log(n) time, this approach would be O(n^2 log(n)) in time.

    However, the space complexity would be halved since we only create the TreeSet and not the second list.

提交回复
热议问题