TreeSet Comparator failed to remove duplicates in some cases?

余生长醉 提交于 2020-04-11 04:44:47

问题


I have the following comparator for my TreeSet:

public class Obj {
    public int id;
    public String value;
    public Obj(int id, String value) {
        this.id = id;
        this.value = value;
    }
    public String toString() {
        return "(" + id + value + ")";
    }
}

Obj obja = new Obj(1, "a");
Obj objb = new Obj(1, "b");
Obj objc = new Obj(2, "c");
Obj objd = new Obj(2, "a");
Set<Obj> set = new TreeSet<>((a, b) -> {
    System.out.println("Comparing " + a + " and " + b);
    int result = a.value.compareTo(b.value);
    if (a.id == b.id) {
        return 0;
    }
    return result == 0 ? Integer.compare(a.id, b.id) : result;
});
set.addAll(Arrays.asList(obja, objb, objc, objd));
System.out.println(set);

It prints out [(1a), (2c)], which removed the duplicates.

But when I changed the last Integer.compare to Integer.compare(b.id, a.id) (i.e. switched the positions of a and b), it prints out [(2a), (1a), (2c)]. Clearly the same id 2 appeared twice.

How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?


回答1:


You're askimg:
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?

You want the comparator to

  1. remove duplicates based on Obj.id
  2. sort the set by Obj.value and Obj.id

Requirement 1) results in

Function<Obj, Integer> byId = o -> o.id;
Set<Obj> setById = new TreeSet<>(Comparator.comparing(byId));

Requirement 2) results in

Function<Obj, String> byValue = o -> o.value;
Comparator<Obj> sortingComparator =  Comparator.comparing(byValue).thenComparing(Comparator.comparing(byId).reversed());
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);

Let's have a look on the JavaDoc of TreeSet. It says:

Note that the ordering maintained by a set [...] must be consistent with equals if it is to correctly implement the Set interface. This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.

The set will be ordered according to the comparator but its elements are also compared for equality using the comparator.

As far as I can see there is no way to define a Comparator which satisfies both requirements. Since a TreeSet is in the first place a Set requirement 1) has to match. To achieve requirement 2) you can create a second TreeSet:

Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
setByValueAndId.addAll(setById);

Or if you don't need the set itself but to process the elements in the desired order you can use a Stream:

Consumer<Obj> consumer = <your consumer>;
setById.stream().sorted(sortingComparator).forEach(consumer);

BTW:
While it's possible to sort the elements of a Stream according to a given Comparator there is no distinct method taking a Comparator to remove duplicates according to it.


EDIT:
You have two different tasks: 1. duplicate removal, 2. sorting. One Comparator cannot solve both tasks. So what alternatives are there?

You can override equals and hashCode on Obj. Then a HashSet or a Stream can be used to remove duplicates.
For the sorting you still need a Comparator (as shown above). Implementing Comparable just for sorting would result in an ordering which is not "consistent with equals" according to Comparable JavaDoc.

Since a Stream can solve both tasks, it would be my choice. First we override hashCode and equals to identify duplicates by id:

public int hashCode() {
    return Integer.hashCode(id);
}

@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    Obj other = (Obj) obj;
    if (id != other.id)
        return false;
    return true;
}

Now we can use a Stream:

// instantiating one additional Obj and reusing those from the question
Obj obj3a = new Obj(3, "a");

// reusing sortingComparator from the code above
Set<Obj> set = Stream.of(obja, objb, objc, objd, obj3a)
        .distinct()
        .sorted(sortingComparator)
        .collect(Collectors.toCollection(LinkedHashSet::new));

System.out.println(set); // [(3a), (1a), (2c)]

The returned LinkedHashSet has the semantics of a Set but it also preserved the ordering of sortingComparator.


EDIT (answering the questions from comments)

Q: Why it didn't finish the job correctly?
See it for yourself. Change the last line of your Comparator like follows

int r = result == 0 ? Integer.compare(a.id, b.id) : result;
System.out.println(String.format("a: %s / b: %s / result: %s -> %s", a.id, b.id, result, r));
return r;

Run the code once and then switch the operands of Integer.compare. The switch results in a different comparing path. The difference is when (2a) and (1a) are compared.

In the first run (2a) is greater than (1a) so it's compared with the next entry (2c). This results in equality - a duplicate is found.

In the second run (2a) is smaller than (1a). Thus (2a) would be compared as next with a previous entry. But (1a) is already the smallest entry and there is no previous one. Hence no duplicate is found for (2a) and it's added to the set.

Q: You said one comparator can't finish two tasks, my 1st comparators in fact did both tasks correctly.
Yes - but only for the given example. Add Obj obj3a to the set as I did and run your code. The returned sorted set is:

[(1a), (3a), (2c)]

This violates your requirement to sort for equal values descending by id. Now it's ascending by id. Run my code and it returns the right order, as shown above.

Struggling with a Comparator a time ago I got the following comment: "... it’s a great exercise, demonstrating how tricky manual comparator implementations can be ..." (source)



来源:https://stackoverflow.com/questions/53371148/treeset-comparator-failed-to-remove-duplicates-in-some-cases

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!