How does a Java HashMap handle different objects with the same hash code?

前端 未结 14 1339
余生分开走
余生分开走 2020-11-22 02:27

As per my understanding I think:

  1. It is perfectly legal for two objects to have the same hashcode.
  2. If two objects are equal (using the equals() method)
相关标签:
14条回答
  • 2020-11-22 03:08

    Each Entry object represents key-value pair. Field next refers to other Entry object if a bucket has more than 1 Entry.

    Sometimes it might happen that hashCodes for 2 different objects are the same. In this case 2 objects will be saved in one bucket and will be presented as LinkedList. The entry point is more recently added object. This object refers to other object with next field and so one. Last entry refers to null. When you create HashMap with default constructor

    Array is gets created with size 16 and default 0.75 load balance.

    enter image description here

    (Source)

    0 讨论(0)
  • 2020-11-22 03:09

    It gonna be a long answer , grab a drink and read on …

    Hashing is all about storing a key-value pair in memory that can be read and written faster. It stores keys in an array and values in a LinkedList .

    Lets Say I want to store 4 key value pairs -

    {
    “girl” => “ahhan” , 
    “misused” => “Manmohan Singh” , 
    “horsemints” => “guess what”, 
    “no” => “way”
    }
    

    So to store the keys we need an array of 4 element . Now how do I map one of these 4 keys to 4 array indexes (0,1,2,3)?

    So java finds the hashCode of individual keys and map them to a particular array index . Hashcode Formulae is -

    1) reverse the string.
    
    2) keep on multiplying ascii of each character with increasing power of 31 . then add the components .
    
    3) So hashCode() of girl would be –(ascii values of  l,r,i,g are 108, 114, 105 and 103) . 
    
    e.g. girl =  108 * 31^0  + 114 * 31^1  + 105 * 31^2 + 103 * 31^3  = 3173020
    

    Hash and girl !! I know what you are thinking. Your fascination about that wild duet might made you miss an important thing .

    Why java multiply it with 31 ?

    It’s because, 31 is an odd prime in the form 2^5 – 1 . And odd prime reduces the chance of Hash Collision

    Now how this hash code is mapped to an array index?

    answer is , Hash Code % (Array length -1) . So “girl” is mapped to (3173020 % 3) = 1 in our case . which is second element of the array .

    and the value “ahhan” is stored in a LinkedList associated with array index 1 .

    HashCollision - If you try to find hasHCode of the keys “misused” and “horsemints” using the formulae described above you’ll see both giving us same 1069518484. Whooaa !! lesson learnt -

    2 equal objects must have same hashCode but there is no guarantee if the hashCode matches then the objects are equal . So it should store both values corresponding to “misused” and “horsemints” to bucket 1 (1069518484 % 3) .

    Now the hash map looks like –

    Array Index 0 –
    Array Index 1 - LinkedIst (“ahhan” , “Manmohan Singh” , “guess what”)
    Array Index 2 – LinkedList (“way”)
    Array Index 3 – 
    

    Now if some body tries to find the value for the key “horsemints” , java quickly will find the hashCode of it , module it and start searching for it’s value in the LinkedList corresponding index 1 . So this way we need not search all the 4 array indexes thus making data access faster.

    But , wait , one sec . there are 3 values in that linkedList corresponding Array index 1, how it finds out which one was was the value for key “horsemints” ?

    Actually I lied , when I said HashMap just stores values in LinkedList .

    It stores both key value pair as map entry. So actually Map looks like this .

    Array Index 0 –
    Array Index 1 - LinkedIst (<”girl” => “ahhan”> , <” misused” => “Manmohan Singh”> , <”horsemints” => “guess what”>)
    Array Index 2 – LinkedList (<”no” => “way”>)
    Array Index 3 – 
    

    Now you can see While traversing through the linkedList corresponding to ArrayIndex1 it actually compares key of each entry to of that LinkedList to “horsemints” and when it finds one it just returns the value of it .

    Hope you had fun while reading it :)

    0 讨论(0)
  • 2020-11-22 03:12

    Your third assertion is incorrect.

    It's perfectly legal for two unequal objects to have the same hash code. It's used by HashMap as a "first pass filter" so that the map can quickly find possible entries with the specified key. The keys with the same hash code are then tested for equality with the specified key.

    You wouldn't want a requirement that two unequal objects couldn't have the same hash code, as otherwise that would limit you to 232 possible objects. (It would also mean that different types couldn't even use an object's fields to generate hash codes, as other classes could generate the same hash.)

    0 讨论(0)
  • 2020-11-22 03:14

    A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):

    It has a number of "buckets" which it uses to store key-value pairs in. Each bucket has a unique number - that's what identifies the bucket. When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key. For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more then one key-value pair).

    When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().

    Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.

    Looking at the above mechanism, you can also see what requirements are necessary on the hashCode() and equals() methods of keys:

    • If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look in the same bucket).

    • If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.

    0 讨论(0)
  • 2020-11-22 03:16

    HashMap structure diagram

    HashMap is an array of Entry objects.

    Consider HashMap as just an array of objects.

    Have a look at what this Object is:

    static class Entry<K,V> implements Map.Entry<K,V> {
            final K key;
            V value;
            Entry<K,V> next;
            final int hash;
    … 
    }
    

    Each Entry object represents a key-value pair. The field next refers to another Entry object if a bucket has more than one Entry.

    Sometimes it might happen that hash codes for 2 different objects are the same. In this case, two objects will be saved in one bucket and will be presented as a linked list. The entry point is the more recently added object. This object refers to another object with the next field and so on. The last entry refers to null.

    When you create a HashMap with the default constructor

    HashMap hashMap = new HashMap();
    

    The array is created with size 16 and default 0.75 load balance.

    Adding a new key-value pair

    1. Calculate hashcode for the key
    2. Calculate position hash % (arrayLength-1) where element should be placed (bucket number)
    3. If you try to add a value with a key which has already been saved in HashMap, then value gets overwritten.
    4. Otherwise element is added to the bucket.

    If the bucket already has at least one element, a new one gets added and placed in the first position of the bucket. Its next field refers to the old element.

    Deletion

    1. Calculate hashcode for the given key
    2. Calculate bucket number hash % (arrayLength-1)
    3. Get a reference to the first Entry object in the bucket and by means of equals method iterate over all entries in the given bucket. Eventually we will find the correct Entry. If a desired element is not found, return null
    0 讨论(0)
  • 2020-11-22 03:17

    Here is a rough description of HashMap's mechanism, for Java 8 version, (it might be slightly different from Java 6).


    Data structures

    • Hash table
      Hash value is calculated via hash() on key, and it decide which bucket of the hashtable to use for a given key.
    • Linked list (singly)
      When count of elements in a bucket is small, a singly linked list is used.
    • Red-Black tree
      When count of elements in a bucket is large, a red-black tree is used.

    Classes (internal)

    • Map.Entry
      Represent a single entity in map, the key/value entity.
    • HashMap.Node
      Linked list version of node.

      It could represent:

      • A hash bucket.
        Because it has a hash property.
      • A node in singly linked list, (thus also head of linkedlist).
    • HashMap.TreeNode
      Tree version of node.

    Fields (internal)

    • Node[] table
      The bucket table, (head of the linked lists).
      If a bucket don't contains elements, then it's null, thus only take space of a reference.
    • Set<Map.Entry> entrySet Set of entities.
    • int size
      Number of entities.
    • float loadFactor
      Indicate how full the hash table is allowed, before resizing.
    • int threshold
      The next size at which to resize.
      Formula: threshold = capacity * loadFactor

    Methods (internal)

    • int hash(key)
      Calculate hash by key.
    • How to map hash to bucket?
      Use following logic:

      static int hashToBucket(int tableSize, int hash) {
          return (tableSize - 1) & hash;
      }
      

    About capacity

    In hash table, capacity means the bucket count, it could be get from table.length.
    Also could be calculated via threshold and loadFactor, thus no need to be defined as a class field.

    Could get the effective capacity via: capacity()


    Operations

    • Find entity by key.
      First find the bucket by hash value, then loop linked list or search sorted tree.
    • Add entity with key.
      First find the bucket according to hash value of key.
      Then try find the value:
      • If found, replace the value.
      • Otherwise, add a new node at beginning of linked list, or insert into sorted tree.
    • Resize
      When threshold reached, will double hashtable's capacity(table.length), then perform a re-hash on all elements to rebuild the table.
      This could be an expensive operation.

    Performance

    • get & put
      Time complexity is O(1), because:
      • Bucket is accessed via array index, thus O(1).
      • Linked list in each bucket is of small length, thus could view as O(1).
      • Tree size is also limited, because will extend capacity & re-hash when element count increase, so could view it as O(1), not O(log N).
    0 讨论(0)
提交回复
热议问题