Unintuitive behavior of removeAll method in sets [closed]

馋奶兔 提交于 2021-01-29 08:19:38

问题


I discovered this weird behavior of the removeAll method of AbstractSets when working with individual Comparators.

Depending on the size of the compared collections a different comparator is used.

It is actually documented in the API but I still cannot see the reason behind it.

Here is the code:

import java.util.Comparator;
import java.util.Set;
import java.util.Stack;
import java.util.TreeSet;

public class Test {
    public static void main(String[] args) {
        // Any comparator. For this example, the length of a string is compared
        Set<String> set = new TreeSet<String>(new Comparator<String>() {
                @Override
                public int compare(String o1, String o2) {
                        return o1.length() - o2.length();
                }
        });

        set.add("a");
        set.add("aa");
        set.add("aaa");
        set.add("aaaa");
        System.out.println(set); // output: [a, aa, aaa, aaaa]

        Stack<String> stack = new Stack<String>();
        stack.push("b");
        stack.push("bb");
        stack.push("bbb");
        stack.push("bbbb");

        set.removeAll(stack); // NO ITEMS ARE REMOVED from the set
        System.out.println(set); // output: [a, aa, aaa, aaaa]

        // Now let's see what happens if I remove an object from the stack
        stack.pop();
        set.removeAll(stack); // ALL ITEMS from the stack are removed from the
                                                        // set
        System.out.println(set); // output: [aaaa]

        /* Reason for this strange behaviour: Depending on the size of the
         * passed Collection, TreeSet uses either the remove() function of
         * itself, or from the Collection object that was passed. While the
         * remove() method of the TreeSet uses the comparator to determine
         * equality, the remove() method of the passed usually determines
         * equality by calling equals() on its objects.
         */
    }
}

Here is the JavaDoc.


回答1:


You have basically created undefined behavior since your sets have different criteria of equality. Combining collections in any way can only work if they have the same. You are basically violating the contract that A.equals(B) must yield the same result as B.equals(A).

Comparable: It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.




回答2:


If you're asking why they chose to implement it this way:

It is likely for performance reasons. Consider having 2 TreeSets, one containing m elements, the other containing n elements. Now consider removing all elements from the one with m elements from the one with n elements. If we stick to iterating through the passed in collection and calling remove, if m is much larger than n, this will be much slower than iterating through the current set and checking if it exists (O(m log n) > O(n log m)). Comparing the sizes prevents this from happening.

It's not a flawless system - if you pass an Stack to a TreeSet, iterating through the TreeSet is asymptotically always a worse idea that to iterate through the Stack (O(m n) > O(m log n)), yet it will follow the same rules as above. Though accounting for all combinations of allowable types would've been somewhat of a hassle.

If you're asking why the code does what it does:

Here is the code for removeAll:

public boolean removeAll(Collection<?> c) {
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

So when the Stack has more or the same number of elements than the TreeSet (happens in the first case), removeAll will iterate through the TreeSet and remove each element contained in the Stack. Since the Stack uses the default String compare, no strings will match, and nothing will be removed.

When the Stack has less elements (happens in the second case), removeAll will iterate through the Stack and call remove on the TreeSet for each element, which uses your Comparator, thus removing all elements with matching length, leaving only the length 4 element, corresponding to the popped element.



来源:https://stackoverflow.com/questions/18123178/unintuitive-behavior-of-removeall-method-in-sets

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!