HashMap initialization parameters (load / initialcapacity)

前端 未结 9 1455
[愿得一人]
[愿得一人] 2020-12-04 10:54

What values should I pass to create an efficient HashMap / HashMap based structures for N items?

In an ArrayList, the efficien

相关标签:
9条回答
  • 2020-12-04 11:28

    In the guava libraries from Google there is a function that creates a HashMap optimized for a expected number of items: newHashMapWithExpectedSize

    from the docs:

    Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth ...

    0 讨论(0)
  • 2020-12-04 11:31

    It's also notable that having a HashMap on the small side makes hash collisions more likely, which can slow down lookup. Hence, if you really worry about the speed of the map, and less about its size, it might be worth making it a bit too large for the data it needs to hold. Since memory is cheap, I typically initialise HashMaps for a known number of items with

    HashMap<Foo> myMap = new HashMap<Foo>(numberOfElements * 2);
    

    Feel free to disagree, in fact I'd quite like to have this idea verified or thrown out.

    0 讨论(0)
  • 2020-12-04 11:32

    In an ArrayList, the efficient number is N (N already assumes future grow).

    Erm, no it doesn't, unless I misunderstand what you're saying here. When you pass an integer into the Arraylist constructor, it will create an underlying array of exactly that size. If it turns out you need even a single extra element, the ArrayList will need to resize the underlying array when you next call add(), causing this call to take a lot longer than it usually would.

    If on the other hand you're talking about your value of N taking into account growth - then yes, if you can guarantee the value will never go above this then calling such an Arraylist constructor is appropriate. And in this case, as pointed out by Hank, the analogous constructor for a map would be N and 1.0f. This should perform reasonably even if you do happen to exceed N (though if you expect this to occur on a regular basis, you may wish to pass in a larger number for the initial size).

    The load factor, in case you weren't aware, is the point at which the map will have its capacity increased, as a fraction of the total capacity.

    Edit: Yuval is probably right that it's a better idea to leave the load factor around 0.75 for a general purpose map. A load factor of 1.0 would perform brilliantly if your keys had sequential hashcodes (such as sequential integer keys), but for anything else you will likely run into collisions with the hash buckets, meaning that lookups take longer for some elements. Creating more buckets than is strictly necessary will reduce this chance of collision, meaning there's more chance of elements being in their own buckets and thus being retrievable in the shortest amount of time. As the docs say, this is a time vs space tradeoff. If either is particularly important to you (as shown by a profiler rather than prematurely optimising!) you can emphasize that; otherwise, stick with the default.

    0 讨论(0)
提交回复
热议问题