HashMap initialization parameters (load / initialcapacity)

前端 未结 9 1456
[愿得一人]
[愿得一人] 2020-12-04 10:54

What values should I pass to create an efficient HashMap / HashMap based structures for N items?

In an ArrayList, the efficien

相关标签:
9条回答
  • 2020-12-04 11:12

    The answer Yuval gave is only correct for Hashtable. HashMap uses power-of-two buckets, so for HashMap, Zarkonnen is actually correct. You can verify this from the source code:

      // Find a power of 2 >= initialCapacity
      int capacity = 1;
      while (capacity < initialCapacity)
      capacity <<= 1;
    

    So, although the load factor of 0.75f is still the same between Hashtable and HashMap, you should use an initial capacity n*2 where n is the number of elements you plan on storing in the HashMap. This will ensure the fastest get/put speeds.

    0 讨论(0)
  • 2020-12-04 11:12

    Referring to HashMap source code will help.

    If the number of entries reaches threshold(capacity * load factor), rehashing is done automatically. That means too small load factor can incur frequent rehashing as entries grow.

    0 讨论(0)
  • 2020-12-04 11:14

    It's safe in most cases of List and Map initialization to make the List or Map with the following size params.

    List<T>(numElements + (numElements / 2));
    Map<T,T>(numElements + (numElements / 2));
    

    this follows the .75 rule as well as saves a little overhead over the * 2 operation described above.

    0 讨论(0)
  • 2020-12-04 11:19

    For very large HashMaps in critical systems, where getting the initial capacity wrong can be very problematic, you may need empirical information to determine how best to initialize your Map.

    CollectionSpy (collectionspy.com) is a new Java profiler which lets you see in the blink of an eye which HashMaps are close to needing rehashing, how many times they have been rehashed in the past, and more. An ideal tool to determine safe initial capacity arguments to capacity-based container constructors.

    0 讨论(0)
  • 2020-12-04 11:22

    I ran some unit tests to see if these answers were correct and it turned out that using:

    (int) Math.ceil(requiredCapacity / loadFactor);
    

    as the initial capacity gives what you want for either a HashMap or a Hashtable. By "what you want" I mean that adding requiredCapacity elements to the map won't cause the array which it's wrapping to resize and the array won't be larger than required. Since the default load capacity is 0.75, initializing a HashMap like so works:

    ... = new HashMap<KeyType, ValueType>((int) Math.ceil(requiredCapacity / 0.75));
    

    Since a HashSet is effectively just a wrapper for a HashMap, the same logic also applies there, i.e. you can construct a HashSet efficiently like this:

    .... = new HashSet<TypeToStore>((int) Math.ceil(requiredCapacity / 0.75));
    

    @Yuval Adam's answer is correct for all cases except where (requiredCapacity / 0.75) is a power of 2, in which case it allocates too much memory.
    @NotEdible's answer uses too much memory in many cases, as the HashMap's constructor itself deals with the issues that it want the maps array to have a size which is a power of 2.

    0 讨论(0)
  • 2020-12-04 11:27

    Regarding the load factor, I'll simply quote from the HashMap javadoc:

    As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

    Meaning, the load factor should not be changed from .75 , unless you have some specific optimization you are going to do. Initial capacity is the only thing you want to change, and set it according to your N value - meaning (N / 0.75) + 1, or something in that area. This will ensure that the table will always be large enough and no rehashing will occur.

    0 讨论(0)
提交回复
热议问题