How Set checks for duplicates? Java HashSet

杀马特。学长 韩版系。学妹 提交于 2019-12-14 03:29:13

问题


For the below code it outputs " 1 ". and second code outputs " 2 " I don't understand why this is happening. Is it because I am adding the same object? How should I achieve the desired output 2.

import java.util.*;
public class maptest {
public static void main(String[] args) {
    Set<Integer[]> set = new HashSet<Integer[]>();
    Integer[] t = new Integer[2];
    t[0] = t[1] = 1;
    set.add(t);
    Integer[] t1 = new Integer[2];
    t[0] = t[1] = 0;
    set.add(t);
    System.out.println(set.size());

   }
}

Second Code:

import java.util.*;
public class maptest {
public static void main(String[] args) {
    Set<Integer[]> set = new HashSet<Integer[]>();
    Integer[] t = new Integer[2];
    t[0] = t[1] = 1;
    set.add(t);
    Integer[] t1 = new Integer[2];
    t1[0] = t1[1] = 1;
    set.add(t1);
    System.out.println(set.size());

    }
}

回答1:


The Set implementation probably calls t.hashCode() and since arrays don't override the Object.hashCode method, the same object will have the same hashcode. Changing the array's contents thus does not affect its hash code. To get an array's hash code correctly, you should call Arrays.hashCode.

You shouldn't really put mutable things inside sets anyways, so I would suggest you put immutable lists into sets instead. If you want to stick with arrays, just create a new array, like you did with t1, and put it into the set.

EDIT:

For code 2, t and t1 are two different arrays so their hash code are different. Again, since the hashCode method is not overridden in arrays. The array's contents don't effect the hash code, whether or not they are the same.




回答2:


A Set contains only distinct element (it is its nature). The basic implementation, HashSet, use hashCode() to first find a bucket containing values then equals(Object) to look for a distinct value.

Arrays are simple: their hashCode() use the default, inherited from Object, and therefore depending on reference. The equals(Object) is also the same than Object: it check only the identify, that is: references must be equals.

Defined as Java:

public boolean equals(Object other) {
  return other == this;
}

If you want to put distinct arrays, you'll have to either try your luck with TreeSet and a proper implementation of Comparator, either wrap you array or use a List or another Set:

Set<List<Integer[]>> set = new HashSet<>();
Integer[] t = new Integer[]{1, 1};
set.add(Arrays.asList(t));
Integer[] t1 = new Integer[]{1, 1};
set.add(Arrays.asList(t1));
System.out.println(set.size());

As for mutability of the object used in a Set or a Map key:

  • fields used by the boolean equals(Object) should not be muted because the muted object could be then equals to another. The Set would no longer contains distinct values.
  • fields used by the int hashCode() should not be muted for hash based collection (HashSet, HashMap) because as said above their operate by putting items in a bucket. If the hashCode() change, it is likely the place of the object in the bucket will also change: the Set would then contains twice the same reference.
  • fields used by the int compareTo(T) or Comparator::compare(T,T) should not be muted for the same reason than equals: the SortedSet would not know there was a change.

If the need arise, you would have to first remove item from the set, then mutate it, the re-add it.




回答3:


You're adding the Object to a Set which

contains no duplicate elements.

You are only ever adding one Object to the Set. You only change the value of it's contents. To see what I mean try adding System.out.println(set.add(t));.

As the add() method:

Returns true if this set did not already contain the specified element

Also your t1 is completely irrelevant in your first code snippet as you never use it.


In your second code snippet it outputs two because you are adding two different Integer[] Objects to the Set

Try printing out the hashcode of the Objects to see how this works:

Integer[] t = new Integer[2];
t[0] = t[1] = 1;
//Before we change the values
System.out.println(t.hashCode());
Integer[] t1 = new Integer[2];
t1[0] = t1[1] = 1;
//After we change the values of t
System.out.println(t.hashCode());
//Hashcode of the second object
System.out.println(t1.hashCode());

Output:

//Hashcode for t is the same before and after modifying data
366712642
366712642
//Hashcode for t1 is different from t; different object
1829164700



回答4:


How java.util.Set implementations check for duplicate objects depends on the implementation, but per the documentation of Set, the appropriate meaning of "duplicate" is that o1.equals(o2).

Since HashSet in particular is based on a hash table, it will go about looking for a duplicate by computing the hashCode() of the object presented to it, and then going through all the objects, if any, in the corresponding hash bucket.

Arrays do not override hashCode() or equals(), so they implement instance identity, not value identity. Thus, regardless of the values of its elements, a given array always has the same hash code, and always equals() itself and only itself. You first code adds the same array object to a set two times. Regardless of the values of its elements, it is still the same set. The second code adds two different array objects to a set. Regardless of the values of their elements, they are different objects.

Note, too, that if you have mutable objects that implement value identity, such that their equality and hash codes depends on the values of their members, then modifying such an object while it is a member of a Set very likely breaks the Set. This is documented on a per-implementation basis.



来源:https://stackoverflow.com/questions/52339600/how-set-checks-for-duplicates-java-hashset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!