While working with a tree set, I found very peculiar behavior.
As per my understanding following program should print two identical lines:
public class T
Well, this surprised me, I don't know if I'm correct, but look at this implementation in AbstractSet
:
public boolean removeAll(Collection<?> c) {
Objects.requireNonNull(c);
boolean modified = false;
if (size() > c.size()) {
for (Iterator<?> i = c.iterator(); i.hasNext(); )
modified |= remove(i.next());
} else {
for (Iterator<?> i = iterator(); i.hasNext(); ) {
if (c.contains(i.next())) {
i.remove();
modified = true;
}
}
}
return modified;
}
Basically in your example, the size of set is equal to the size of arguments you want to remove, so the else condition is invoked. In that condition there is a check if your collection of arguments to remove contains
the current element of iterator, and that check is case sensitive, so it checks if c.contains("a")
and it returns false, because c
contains "A"
, not "a"
, so the element is not removed. Notice that when you add an element to your set s.addAll(Arrays.asList("a", "b", "d"));
it works correctly, because size() > c.size()
is now true, thus there is no contains
check anymore.
This is interesting, so here are some tests with output:
static void test(String... args) {
Set<String> s =new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
s.addAll(Arrays.asList( "a","b","c"));
s.removeAll(Arrays.asList(args));
System.out.println(s);
}
public static void main(String[] args) {
test("C"); output: [a, b]
test("C", "A"); output: [b]
test("C", "A","B"); output: [a, b, c]
test("B","C","A"); output: [a, b, c]
test("K","C"); output: [a, b]
test("C","K","M"); output: [a, b, c] !!
test("C","K","A"); output: [a, b, c] !!
}
Now without the comparator it works just like a sorted HashSet<String>()
:
static void test(String... args) {
Set<String> s = new TreeSet<String>();//
s.addAll(Arrays.asList( "a","b","c"));
s.removeAll(Arrays.asList(args));
System.out.println(s);
}
public static void main(String[] args) {
test("c"); output: [a, b]
test("c", "a"); output: [b]
test("c", "a","b"); output: []
test("b","c","a"); output: []
test("k","c"); output: [a, b]
test("c","k","m"); output: [a, b]
test("c","k","m"); output: [a, b]
}
Now from the documentation:
public boolean removeAll(Collection c)
Removes from this set all of its elements that are contained in the specified collection (optional operation). If the specified collection is also a set, this operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.
This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method.
Source
To add some information about why the remove
of TreeSet
actually removes case-insensively in your example (and provided that you follow the if (size() > c.size())
path as explained in the answer by @Shadov) :
This is the remove
method in TreeSet
:
public boolean remove(Object o) {
return m.remove(o)==PRESENT;
}
it calls remove
from its internal TreeMap
:
public V remove(Object key) {
Entry<K,V> p = getEntry(key);
if (p == null)
return null;
V oldValue = p.value;
deleteEntry(p);
return oldValue;
}
which calls getEntry
final Entry<K,V> getEntry(Object key) {
// Offload comparator-based version for sake of performance
if (comparator != null)
return getEntryUsingComparator(key);
if (key == null)
throw new NullPointerException();
@SuppressWarnings("unchecked")
Comparable<? super K> k = (Comparable<? super K>) key;
Entry<K,V> p = root;
while (p != null) {
int cmp = k.compareTo(p.key);
if (cmp < 0)
p = p.left;
else if (cmp > 0)
p = p.right;
else
return p;
}
return null;
}
If there is a Comparator
(as in your example), the entry is searched based on this Comparator
(this is done by getEntryUsingComparator
), that's why it is actually found (then removed) , despite the case difference.
This happens because a SortedSet’s Comparator is used for sorting, but removeAll relies on the equals
method of each element. From the SortedSet documentation:
Note that the ordering maintained by a sorted set (whether or not an explicit comparator is provided) must be consistent with equals if the sorted set is to correctly implement the
Set
interface. (See theComparable
interface orComparator
interface for a precise definition of consistent with equals.) This is so because theSet
interface is defined in terms of theequals
operation, but a sorted set performs all element comparisons using itscompareTo
(orcompare
) method, so two elements that are deemed equal by this method are, from the standpoint of the sorted set, equal. The behavior of a sorted set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of theSet
interface.
The explanation of “consistent with equals” is defined in the Comparable documentation:
The natural ordering for a class
C
is said to be consistent with equals if and only ife1.compareTo(e2) == 0
has the same boolean value ase1.equals(e2)
for everye1
ande2
of classC
. Note thatnull
is not an instance of any class, ande.compareTo(null)
should throw aNullPointerException
even thoughe.equals(null)
returnsfalse
.It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the
equals
method.
In summary, your Set’s Comparator behaves differently than the elements’ equals
method, causing unusual (though predictable) behavior.