In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity of map

前端未结

关注

 4  780

In HashMap why threshold value (The next size value at which to resize) is capacity * load factor. Why not as equal to size or capacity

相关标签:

4条回答

爱一瞬间的悲伤

2021-02-05 23:37

From a theory perspective, the likelihood of maintaining no collisions with a full hash table is very low, so hash tables will be resized to maintain their desired O(1) lookup property - less collisions means more direct access to entries and less searching.

0 讨论(0)

发布评论:

提交评论

加载中...

执念已碎

2021-02-05 23:54

Javadoc, Javadoc, Javadoc. That is the first place you look. On the HashMap it says:

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

As on the theory of hash maps - if your map is full, then you're doing something very, very wrong. By that time you're likely at O(sqrt(N)) on lookups with random data - BAD. You never want your hashmap to be full. But a very sparse map will waste too much space (as you've noted), and will take too long to iterate through. Hence there should be a load factor, that is less than 1 for most use cases.

Note: The "wasted space" is proportional to the size of the map, and inversely proportional to the load factor. However lookup times have a more complex expected performance function. This means that the same load factor will not work for different size hash maps - as it will mean different scale tradeoffs.

A general overview of the tradeoffs can be found in Knuth "The Art of Computer Programming" vol 3.

0 讨论(0)

发布评论:

提交评论

加载中...

眼角桃花

2021-02-05 23:55

The HashMap implementation allows you to set the load factor. This design decision gives the user of the class some measure of control over the conditions under which the underlying data structure is resized.

The default load factor value of 0.75 was likely chosen as a reasonable balance between memory usage and map performance (determined by collision rate and resize overhead).

For any given instance of HashMap, you get to choose the appropriate load factor for your particular situation. You need to consider the relative importance of a small memory footprint, how performance sensitive you are for lookups, and how performance sensitive you are for put's (put that causes the map to be rebuilt can be very slow).

As an aside, your concept of a "full" HashMap is a little skewed. The implementation handles an arbitrary number of collisions just fine (although there is a performance cost to collisions). You could use a HashMap with a load factor of 1 billion and it would (probably) never grow beyond a capacity of 16.

There is no problem with setting load factor to 1.0, which would result in a rehash operation when you add the 17th element to a default-sized HashMap. Compared to the default of 0.75, you will use a little less space, do fewer rehashes, and have a few more collisions (and thus searching using equals() in a linked list).

0 讨论(0)

发布评论:

提交评论

加载中...

心在旅途

2021-02-05 23:55

In java 8 when threshold is reached, than content of bucket is swithcing from using links between object to balanced tree which improves performance from O(n) to O(log n). This is one of features in java 8 sometimes need to remember

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复