Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?

后端 未结 7 1126
暖寄归人
暖寄归人 2020-12-28 17:28

As per Sun Java Implementation, during expansion, ArrayList grows to 3/2 it\'s initial capacity whereas for HashMap the expansion rate is double. What is reason behind this?

相关标签:
7条回答
  • 2020-12-28 17:52

    The accepted answer is not actually giving exact response to the question, but comment from @user837703 to that answer is clearly explaining why HashMap grows by power of two.

    I found this article, which explains it in detail http://coding-geek.com/how-does-a-hashmap-work-in-java/

    Let me post fragment of it, which gives detailed answer to the question:

    // the function that returns the index of the bucket from the rehashed hash
    static int indexFor(int h, int length) {
        return h & (length-1);
    }
    

    In order to work efficiently, the size of the inner array needs to be a power of 2, let’s see why.

    Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…

    But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:

    • if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
    • if H = 1576 its binary representation is 0..011000101000, the associated index is 0…01000 = 8
    • if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
    • if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3

    This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.

    0 讨论(0)
  • 2020-12-28 17:53

    I can't give you a reason why this is so (you'd have to ask Sun developers), but to see how this happens take a look at source:

    1. HashMap: Take a look at how HashMap resizes to new size (source line 799)

           resize(2 * table.length);
      
    2. ArrayList: source, line 183:

      int newCapacity = (oldCapacity * 3)/2 + 1;
      

    Update: I mistakenly linked to sources of Apache Harmony JDK - changed it to Sun's JDK.

    0 讨论(0)
  • 2020-12-28 18:02

    for HashMap why the capacity should always be in power of two?

    I can think of two reasons.

    1. You can quickly determine the bucket a hashcode goes in to. You only need a bitwise AND and no expensive modulo. int bucket = hashcode & (size-1);

    2. Let's say we have a grow factor of 1.7. If we start with a size 11, the next size would be 18, then 31. No problem. Right? But the hashcodes of Strings in Java, are calculated with a prime factor of 31. The bucket a string goes into,hashcode%31, is then determined only by the last character of the String. Bye bye O(1) if you store folders that all end in /. If you use a size of, for example, 3^n, the distribution will not get worse if you increase n. Going from size 3 to 9, every element in bucket 2, will now go to bucket 2,5 or 7, depending on the higher digit. It's like splitting each bucket in three pieces. So a size of integer growth factor would be preferred. (Off course this all depends on how you calculate hashcodes, but a arbitrary growth factor doesn't feel 'stable'.)

    0 讨论(0)
  • 2020-12-28 18:05

    The way HashMap is designed/implemented its underlying number of buckets must be a power of 2 (even if you give it a different size, it makes it a power of 2), thus it grows by a factor of two each time. An ArrayList can be any size and it can be more conservative in how it grows.

    0 讨论(0)
  • 2020-12-28 18:08

    The expensive part at increasing the capacity of an ArrayList is copying the content of the backing array a new (larger) one.

    For the HashMap, it is creating a new backing array and putting all map entries in the new array. And, the higher the capacity, the lower the risk of collisions. This is more expensive and explains, why the expansion factor is higher. The reason for 1.5 vs. 2.0? I consider this as "best practise" or "good tradeoff".

    0 讨论(0)
  • 2020-12-28 18:12

    A general rule to avoid collisions on Maps is to keep to load factor max at around 0.75 To decrease possibility of collisions and avoid expensive copying process HashMap grows at a larger rate.

    Also as @Peter says, it must be a power of 2.

    0 讨论(0)
提交回复
热议问题