Why HashMap initial capacity is not properly handled by the library?

梦想的初衷 提交于 2019-12-10 23:57:14

问题


To create HashMap/HashSet for N elements, we generally donew HashMap((int)(N/0.75F)+1) which is annoying.

Why the library has not taken care of this in the first place and allows initialization like new HashMap(N)(should not rehash till N elements) taking care of this calculation (int)(N/0.75F)+1?


回答1:


Update

Updating to reflect changed question. No, there is no such standard API but it seems there is a method Maps.newHashMapWithExpectedSize(int) in guava:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth.


i have to initialize it to (int)(N/0.75F)+1

No you don't. If you create new HashMap from other Map, HashMap calculates capacity first by default:

public HashMap(Map<? extends K, ? extends V> m) {
    this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1,
                  DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR);
    putAllForCreate(m);
}

If you add elements one by one, the same process happens as well:

void addEntry(int hash, K key, V value, int bucketIndex) {
    if ((size >= threshold) && (null != table[bucketIndex])) {
        resize(2 * table.length);
        //...
    }

    createEntry(hash, key, value, bucketIndex);
}

The only reason to use HashMap(int initialCapacity, float loadFactor) constructor is when you know from the very beginning how many elements you want to store in the HashMap, thus avoiding resizing and rehashing later (map has correct size from the very beginning).

One interesting implementation detail is that initial capacity is trimmed to the nearest power of two (see: Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?):

// Find a power of 2 >= initialCapacity
int capacity = 1;
while (capacity < initialCapacity)
    capacity <<= 1;

So if you want your HashMap to have exact capacity as defined, just use powers of two.

Choosing different loadFactor allows you to trade space for performance - smaller value means more memory, but less collisions.




回答2:


I have run the following program

public static void main(String... args) throws IllegalAccessException, NoSuchFieldException {
    for (int i = 12; i < 80; i++) {
        Map<Integer, Integer> map = new HashMap<Integer, Integer>((int) Math.ceil(i / 0.75));
        int beforeAdding = Array.getLength(getField(map, "table"));
        for (int j = 0; j < i; j++) map.put(j, j);
        int afterAdding = Array.getLength(getField(map, "table"));
        map.put(i, i);
        int oneMore = Array.getLength(getField(map, "table"));
        System.out.printf("%,d: initial %,d, after N %,d, after N+1 %,d%n ",
                i, beforeAdding, afterAdding, oneMore);
    }
}

private static <T> T getField(Map<Integer, Integer> map, String fieldName) throws NoSuchFieldException, IllegalAccessException {
    Field table = map.getClass().getDeclaredField(fieldName);
    table.setAccessible(true);
    return (T) table.get(map);
}

which prints out

 12: initial 16, after N 16, after N+1 32
 13: initial 32, after N 32, after N+1 32
 .. deleted ..
 24: initial 32, after N 32, after N+1 64
 25: initial 64, after N 64, after N+1 64
 .. deleted ..
 47: initial 64, after N 64, after N+1 64
 48: initial 64, after N 64, after N+1 128
 49: initial 128, after N 128, after N+1 128
 .. deleted ..
 79: initial 128, after N 128, after N+1 128

This shows that the default initialiser the initial capacity is rounded to the next power of two. The problem with this value is that if you want this to be the eventual size, you have to take into account the load factor if you want to avoid resizing. Ideally you shouldn't have to, in the way the Map copy constructor does for you.




回答3:


Most implementations grow automatically as you add more elements. The performance of most implementations also tends to decrease when the containers get fuller. That's why there is a load factor in the first place: to leave some empty space available.



来源:https://stackoverflow.com/questions/13329819/why-hashmap-initial-capacity-is-not-properly-handled-by-the-library

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!