问题
I have the following comparator for my TreeSet:
public class Obj {
public int id;
public String value;
public Obj(int id, String value) {
this.id = id;
this.value = value;
}
public String toString() {
return "(" + id + value + ")";
}
}
Obj obja = new Obj(1, "a");
Obj objb = new Obj(1, "b");
Obj objc = new Obj(2, "c");
Obj objd = new Obj(2, "a");
Set<Obj> set = new TreeSet<>((a, b) -> {
System.out.println("Comparing " + a + " and " + b);
int result = a.value.compareTo(b.value);
if (a.id == b.id) {
return 0;
}
return result == 0 ? Integer.compare(a.id, b.id) : result;
});
set.addAll(Arrays.asList(obja, objb, objc, objd));
System.out.println(set);
It prints out [(1a), (2c)], which removed the duplicates.
But when I changed the last Integer.compare
to Integer.compare(b.id, a.id)
(i.e. switched the positions of a and b), it prints out [(2a), (1a), (2c)]. Clearly the same id 2 appeared twice.
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?
回答1:
You're askimg:
How do you fix the comparator to always remove the duplicates based on ids and sort the ordered set based on value (ascending) then id (descending)?
You want the comparator to
- remove duplicates based on
Obj.id
- sort the set by
Obj.value
andObj.id
Requirement 1) results in
Function<Obj, Integer> byId = o -> o.id;
Set<Obj> setById = new TreeSet<>(Comparator.comparing(byId));
Requirement 2) results in
Function<Obj, String> byValue = o -> o.value;
Comparator<Obj> sortingComparator = Comparator.comparing(byValue).thenComparing(Comparator.comparing(byId).reversed());
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
Let's have a look on the JavaDoc of TreeSet
. It says:
Note that the ordering maintained by a set [...] must be consistent with
equals
if it is to correctly implement theSet
interface. This is so because theSet
interface is defined in terms of theequals
operation, but aTreeSet
instance performs all element comparisons using itscompareTo
(or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal.
The set will be ordered according to the comparator but its elements are also compared for equality using the comparator.
As far as I can see there is no way to define a Comparator
which satisfies both requirements. Since a TreeSet
is in the first place a Set
requirement 1) has to match. To achieve requirement 2) you can create a second TreeSet
:
Set<Obj> setByValueAndId = new TreeSet<>(sortingComparator);
setByValueAndId.addAll(setById);
Or if you don't need the set itself but to process the elements in the desired order you can use a Stream
:
Consumer<Obj> consumer = <your consumer>;
setById.stream().sorted(sortingComparator).forEach(consumer);
BTW:
While it's possible to sort the elements of a Stream
according to a given Comparator
there is no distinct
method taking a Comparator
to remove duplicates according to it.
EDIT:
You have two different tasks: 1. duplicate removal, 2. sorting. One Comparator
cannot solve both tasks. So what alternatives are there?
You can override equals
and hashCode
on Obj
. Then a HashSet
or a Stream
can be used to remove duplicates.
For the sorting you still need a Comparator
(as shown above). Implementing Comparable
just for sorting would result in an ordering which is not "consistent with equals" according to Comparable
JavaDoc.
Since a Stream
can solve both tasks, it would be my choice. First we override hashCode
and equals
to identify duplicates by id
:
public int hashCode() {
return Integer.hashCode(id);
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Obj other = (Obj) obj;
if (id != other.id)
return false;
return true;
}
Now we can use a Stream
:
// instantiating one additional Obj and reusing those from the question
Obj obj3a = new Obj(3, "a");
// reusing sortingComparator from the code above
Set<Obj> set = Stream.of(obja, objb, objc, objd, obj3a)
.distinct()
.sorted(sortingComparator)
.collect(Collectors.toCollection(LinkedHashSet::new));
System.out.println(set); // [(3a), (1a), (2c)]
The returned LinkedHashSet
has the semantics of a Set
but it also preserved the ordering of sortingComparator
.
EDIT (answering the questions from comments)
Q: Why it didn't finish the job correctly?
See it for yourself. Change the last line of your Comparator
like follows
int r = result == 0 ? Integer.compare(a.id, b.id) : result;
System.out.println(String.format("a: %s / b: %s / result: %s -> %s", a.id, b.id, result, r));
return r;
Run the code once and then switch the operands of Integer.compare
. The switch results in a different comparing path. The difference is when (2a)
and (1a)
are compared.
In the first run (2a)
is greater than (1a)
so it's compared with the next entry (2c)
. This results in equality - a duplicate is found.
In the second run (2a)
is smaller than (1a)
. Thus (2a)
would be compared as next with a previous entry. But (1a)
is already the smallest entry and there is no previous one. Hence no duplicate is found for (2a)
and it's added to the set.
Q: You said one comparator can't finish two tasks, my 1st comparators in fact did both tasks correctly.
Yes - but only for the given example. Add Obj obj3a
to the set as I did and run your code. The returned sorted set is:
[(1a), (3a), (2c)]
This violates your requirement to sort for equal value
s descending by id
. Now it's ascending by id
. Run my code and it returns the right order, as shown above.
Struggling with a Comparator
a time ago I got the following comment: "... it’s a great exercise, demonstrating how tricky manual comparator implementations can be ..." (source)
来源:https://stackoverflow.com/questions/53371148/treeset-comparator-failed-to-remove-duplicates-in-some-cases