Overriding Python's Hashing Function in Dictionary

后端 未结 2 1649
后悔当初
后悔当初 2020-12-17 00:59

I am trying to create a custom hash function for some object that I\'ll be hashing into a dictionary. The hashing function is unique (not the standard Python one). This is v

相关标签:
2条回答
  • 2020-12-17 01:26

    The python dict implementation uses the hash value to both sparsely store values based on the key and to avoid collisions in that storage. It uses the result of hash() as a starting point, it is not the definitive position.

    Thus, although hash(4) returns 4, the exact 'position' in the underlying C structure is also based on what other keys are already there, and how large the current table is. The python hash table is resized as needed (items added), for example.

    Since a dict has no ordering, this is not something you need to worry about, or can hope to influence. If you need ordering in a dict, use the collections.OrderedDict() implementation instead, which keeps track of ordering separately.

    The details of the python hash table implementation

    You may want to read up on how hash tables work on Wikipedia; Python uses open addressing for it's implementation.

    When selecting a slot in the table, the modulus of the hash value (an integer) and the current table size is taken, thus on a table of size 32, so the key 45, hash value 45 would initially be stored in slot 14.

    If there is a collision (there already is something else stored in slot 14 and it's not the integer 45), then the slot value is perturbed until an empty slot is found or the same key is found instead. Perturbing is done with the formula:

    perturb = slot = hash
    while slot_is_full and item_in_slot_is_not_equal_to_key:
        slot = (5*slot) + 1 + perturb
        perturb >>= 5
    

    So, when there is a collision, another slot is picked at progressively smaller steps until it scans the whole table. Note that the table will already have been resized to make space if necessary.

    In order for this to work correctly, custom types need both a __hash__() method and need to implement __eq__() to determine if two instances represent the same key. Matching hash values is not enough. For the dict implementation to consider two instances to represent the exact same key, both their hash values must match, and they must return True for the == equality operator. Such objects are considered hashable.

    (For Python 2.x, implementing the __cmp__() hook would do instead of implementing __eq__(); support for this has been removed in Python 3).

    0 讨论(0)
  • 2020-12-17 01:27

    Martijn Pieters gave an excellent answer. I just wanted to add that the CPython source is well documented and also an excellent place to look into the details of the Python Dictionary implementation. Last I checked, the pertinent comments were on lines 33-135.

    0 讨论(0)
提交回复
热议问题