HashSet order and difference with JDK 7 / 8

不问归期 提交于 2019-12-19 09:04:31

问题


This is a two part question:

  1. Does HashSet implement some hidden ordering mechanic or it just, to quote the documentation:It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. tells me that the order MIGHT change sometimes in the future and/or depending on memory usage?
  2. Why am I getting completely different 'ordering' (dare I say) when I switch between JDKs?

Take this example:

for (int i = 0; i < 1000; i++) {
        Set<String> stringSet = new HashSet<>();
        stringSet.add("qwe");
        stringSet.add("rtz");
        stringSet.add("123");
        stringSet.add("qwea");
        stringSet.add("12334rasefasd");
        stringSet.add("asdxasd");
        stringSet.add("arfskt6734");
        stringSet.add("123121");
        stringSet.add("");
        stringSet.add("qwr");
        stringSet.add("rtzz");
        stringSet.add("1234");
        stringSet.add("qwes");
        stringSet.add("1234rasefasd");
        stringSet.add("asdxasdq");
        stringSet.add("arfskt6743");
        stringSet.add("123121 ");
        stringSet.add(" ");
        System.out.println(stringSet);
    }

will, no matter how many times I run it, produce the following output:

JDK 7: [, , 123, qwea, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtz, rtzz, 1234, 12334rasefasd, asdxasd, arfskt6734, 123121]

JDK 8: [, , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd]

Obviously, the empty string and the whitespace-only-string are leading the way both times, but the rest of it is completely different.


回答1:


According to the update on collections changes page

The alternative String hash function added in 7u6 has been removed from JDK 8, along with the jdk.map.althashing.threshold system property. Instead, hash bins containing a large number of colliding keys improve performance by storing their entries in a balanced tree instead of a linked list. This JDK 8 change applies only to HashMap, LinkedHashMap, and ConcurrentHashMap.

In rare situations, this change could introduce a change to the iteration order of HashMap and HashSet. A particular iteration order is not specified for HashMap objects - any code that depends on iteration order should be fixed.


So, Basically

The Algorithm for hashing your set has changed to improve performance. It changed to a balanced tree instead of a linked list.

This kind of change might change the iteration order of your set, and it's established that this kind of behavior should be fixed by you, if you depend upon it.

You're seeing a better implementation of your set, which might look like it's ordered, but it's pure coincidence.

I would recommend you to not rely on the iteration order for sets, as the order is not a guarantee.

@Edit

Another concept is also important, as stated by user Holger,

The “alternative String hash function” of Java 7 was not used by default. Further, the balanced tree only applies to bucket collision scenarios. Still, as a consequence of this improvement, another, unmentioned change has been made. The mapping of an object’s hashcode to an array position undergoes a transformation which has been simplified from Java 7 to Java 8




回答2:


The internal storage of a HashSet is defined by an algorithm. It is not randomized.

The specification (API) does not specify any particular algorithm, other than it being hash-based. The implementation can choose any algorithm it wants, and is free to choose a different one in future versions.

However, being algorithm-based, it means that for any particular version of the implementation (Oracle vs IBM, 7 vs 8, ...), adding a particular set of values will always produce the same result, i.e. ordering.

The ordering is consistent for a specific version, but it is undefined and subject to change without notice in future versions and/or different implementations, so you should never rely on order.




回答3:


To have something more exciting, change your example code to

public static void main(String... args) {
  System.out.println(System.getProperty("java.version"));
  List<String> strings=Arrays.asList("qwe", "rtz", "123", "qwea",
      "12334rasefasd", "asdxasd", "arfskt6734", "123121", "", "qwr",
      "rtzz", "1234", "qwes", "1234rasefasd", "asdxasdq", "arfskt6743",
      "123121 ", " ");

  for (int i = 5; i < 26; i++) {
      Set<String> stringSet = new HashSet<>(1<<i);
      stringSet.addAll(strings);
      System.out.println(stringSet);
  }
}

This still adds the same strings in the same order to a HashSet, but the HashSet has been initialized with different capacities.
The result is

1.7.0_51
[,  , qwea, 123, asdxasdq, qwe, qwr, 123121 , arfskt6743, 1234rasefasd, qwes, rtzz, rtz, 1234, 12334rasefasd, arfskt6734, asdxasd, 123121]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , qwea, 123, asdxasdq, qwe, 123121 , 1234rasefasd, qwes, rtz, 1234, arfskt6734, asdxasd, 123121]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , 123, rtz, 1234, arfskt6734, asdxasd, 123121, qwea, asdxasdq, qwe, 123121 , 1234rasefasd, qwes]
[, qwr, arfskt6743, rtzz, 12334rasefasd,  , arfskt6734, asdxasd, 123121, 123121 , 1234rasefasd, 123, rtz, 1234, qwea, asdxasdq, qwe, qwes]
[, rtzz,  , 123121, 123121 , 1234rasefasd, 123, rtz, qwea, asdxasdq, qwe, qwes, qwr, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, 1234]
[, rtzz,  , 123121, 123, asdxasdq, 123121 , 1234rasefasd, rtz, qwea, qwe, qwes, qwr, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, 1234]
[,  , 123121, asdxasdq, 1234rasefasd, rtz, qwea, qwes, arfskt6743, 12334rasefasd, arfskt6734, asdxasd, rtzz, 123, 123121 , qwe, qwr, 1234]
[,  , 123121, 1234rasefasd, rtz, qwea, qwes, asdxasd, rtzz, 123, 1234, asdxasdq, arfskt6743, 12334rasefasd, arfskt6734, 123121 , qwe, qwr]
[,  , rtz, asdxasd, rtzz, arfskt6743, 12334rasefasd, arfskt6734, 123121 , qwe, qwr, 123121, 1234rasefasd, qwea, qwes, 123, 1234, asdxasdq]
[,  , arfskt6743, 12334rasefasd, arfskt6734, qwea, qwes, 1234, asdxasdq, rtz, asdxasd, rtzz, 123121 , qwe, qwr, 123121, 1234rasefasd, 123]
[,  , qwea, qwes, rtz, asdxasd, rtzz, 123121 , qwe, qwr, 123, arfskt6743, 12334rasefasd, arfskt6734, 1234, asdxasdq, 123121, 1234rasefasd]
[,  , qwea, qwes, asdxasd, 1234, asdxasdq, 123121, 1234rasefasd, rtz, rtzz, 123121 , qwe, qwr, 123, arfskt6743, 12334rasefasd, arfskt6734]
[,  , qwea, qwes, asdxasd, 1234, 1234rasefasd, rtzz, 123121 , 123, arfskt6743, arfskt6734, asdxasdq, 123121, rtz, qwe, qwr, 12334rasefasd]
[,  , 1234rasefasd, 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, qwea, qwes, asdxasd, 1234, rtzz, 123121 , arfskt6743, arfskt6734, 123121]
[,  , 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, 123121 , 123121, 1234rasefasd, qwea, qwes, asdxasd, 1234, rtzz, arfskt6743, arfskt6734]
[,  , 123, asdxasdq, rtz, qwe, qwr, 12334rasefasd, 1234rasefasd, qwea, qwes, asdxasd, 1234, rtzz, 123121 , 123121, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 12334rasefasd, asdxasd, asdxasdq, 1234rasefasd, qwea, qwes, 1234, rtzz, 123121 , 123121, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, asdxasd, asdxasdq, 1234, arfskt6743, arfskt6734, 12334rasefasd, 1234rasefasd, qwea, qwes, rtzz, 123121 , 123121]
[,  , 123, rtz, qwe, qwr, asdxasdq, 1234, 12334rasefasd, 1234rasefasd, qwea, qwes, rtzz, 123121 , 123121, asdxasd, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 1234, 12334rasefasd, qwea, qwes, rtzz, 123121 , asdxasdq, 1234rasefasd, 123121, asdxasd, arfskt6743, arfskt6734]
[,  , 123, rtz, qwe, qwr, 1234, qwea, qwes, rtzz, 12334rasefasd, 123121 , asdxasdq, 1234rasefasd, 123121, asdxasd, arfskt6743, arfskt6734]
1.8.0_111
[,  , qwes, arfskt6743, asdxasdq, 123121, 123121 , arfskt6734, qwr, 123, 1234, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe, asdxasd]
[, 123121, arfskt6734, qwr, 1234, asdxasd,  , qwes, arfskt6743, asdxasdq, 123121 , 123, qwea, rtzz, rtz, 12334rasefasd, 1234rasefasd, qwe]
[, arfskt6734, qwr,  , arfskt6743, asdxasdq, 123, rtzz, 123121, 1234, asdxasd, qwes, 123121 , qwea, rtz, 12334rasefasd, 1234rasefasd, qwe]
[, qwr,  , 123, rtzz, 1234, asdxasd, qwes, 123121 , qwea, rtz, 1234rasefasd, arfskt6734, arfskt6743, asdxasdq, 123121, 12334rasefasd, qwe]
[,  , 123, 1234, rtz, arfskt6734, arfskt6743, asdxasdq, 123121, 12334rasefasd, qwe, qwr, rtzz, asdxasd, qwes, 123121 , qwea, 1234rasefasd]
[,  , 1234, asdxasdq, 123121, 12334rasefasd, rtzz, asdxasd, qwes, 123121 , qwea, 123, rtz, arfskt6734, arfskt6743, qwe, qwr, 1234rasefasd]
[,  , 1234, 12334rasefasd, qwes, qwea, rtz, asdxasdq, 123121, rtzz, asdxasd, 123121 , 123, arfskt6734, arfskt6743, qwe, qwr, 1234rasefasd]
[,  , asdxasdq, rtzz, 123121 , arfskt6734, arfskt6743, qwe, qwr, 1234, 12334rasefasd, qwes, qwea, rtz, 123121, asdxasd, 123, 1234rasefasd]
[,  , asdxasdq, 123121 , arfskt6734, arfskt6743, 1234, 12334rasefasd, qwes, qwea, 123121, asdxasd, 1234rasefasd, rtzz, qwe, qwr, rtz, 123]
[,  , asdxasdq, 1234, rtzz, 123121 , arfskt6734, arfskt6743, 12334rasefasd, qwes, qwea, 123121, asdxasd, 1234rasefasd, qwe, qwr, rtz, 123]
[,  , 1234, rtzz, 123121 , arfskt6734, arfskt6743, 12334rasefasd, qwes, qwea, 123121, asdxasd, qwe, qwr, rtz, 123, asdxasdq, 1234rasefasd]
[,  , 1234, 123121 , arfskt6734, arfskt6743, qwes, qwea, asdxasd, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz, 123, asdxasdq, 1234rasefasd]
[,  , arfskt6734, arfskt6743, asdxasd, 123, asdxasdq, 1234rasefasd, 1234, 123121 , qwes, qwea, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz]
[,  , arfskt6734, arfskt6743, 123, asdxasdq, 123121 , qwes, qwea, rtzz, 12334rasefasd, 123121, qwe, qwr, rtz, asdxasd, 1234rasefasd, 1234]
[,  , 123, 123121 , qwe, qwr, rtz, asdxasd, 1234rasefasd, arfskt6734, arfskt6743, asdxasdq, qwes, qwea, rtzz, 12334rasefasd, 123121, 1234]
[,  , 123, qwe, qwr, rtz, asdxasd, 1234rasefasd, arfskt6734, arfskt6743, qwes, qwea, rtzz, 123121, 1234, 123121 , asdxasdq, 12334rasefasd]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 123121 , asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121, 1234, 12334rasefasd]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 123121 , 1234, 12334rasefasd, asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121]
[,  , 123, qwe, qwr, rtz, arfskt6734, arfskt6743, 1234, asdxasdq, asdxasd, 1234rasefasd, qwes, qwea, rtzz, 123121 , 12334rasefasd, 123121]
[,  , 123, qwe, qwr, rtz, 1234, asdxasdq, asdxasd, qwes, qwea, rtzz, 123121 , 12334rasefasd, 123121, arfskt6734, arfskt6743, 1234rasefasd]
[,  , 123, qwe, qwr, rtz, 1234, qwes, qwea, rtzz, 123121 , 123121, arfskt6734, arfskt6743, asdxasdq, asdxasd, 12334rasefasd, 1234rasefasd]

This demonstrates that the iteration order is not only depending on the particular implementation, but also on the history of the HashSet. A higher capacity might also be the result of being previously bigger but having elements removed.

While the hash code determines which array position to use for an element, there might also be collisions, causing elements to share an entry. Within that entry the collision might get resolved through a linked list, in which case, the order within this bucket reflects the insertion order, so it also depends on the set’s history, or might get resolved by using a balanced tree since Java 8, which will reflect the order of either, the hashcodes or the element’s natural order, depending on whether this as a true hash collision or just a bucket collision.

But Java 8’s HashSet will only use the tree, if there are a certain number of collisions at a bucket, otherwise, it also uses a linked list. To avoid switching back and forth between these variants, it uses different thresholds for converting to a tree and for converting back to a linked list. So, if the number of collision is in-between these threshold, it will again depend on the set’s history, i.e. whether there were more elements previously, which form and hence which order it will have.


Note that Java 7’s “alternative String hash function” was disabled by default and the collision resolution addresses a corner case. Still, as you can see from the output, there is almost always a notable difference in the iteration order.

The reason is that now that collisions are handled more efficiently, the attempts to avoid collisions have been reduced. In Java 7, hash codes underwent the following transformation before getting mapped to array positions:

 h ^= (h >>> 20) ^ (h >>> 12);
 return h ^ (h >>> 7) ^ (h >>> 4);

In contrast, Java 8 uses the following transformation:

return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);

This has an immediate impact on the iteration order, even if no collisions occurred.




回答4:


Ans 1 : Yes Hashset doesn't maintain insertion order but after that if you iterate it you will get same order every time.

Ans 2: iteration result may differ with java versions because it depends on hashcode implementation of that version. but Hashset provides one surety that iteration order will never get change means after insertion of elements if you iterate it every time you will get same order in that java version.



来源:https://stackoverflow.com/questions/45573023/hashset-order-and-difference-with-jdk-7-8

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!