Why are tuples constructed from differently initialized sets equal?

前端 未结 4 1087
无人共我
无人共我 2021-02-05 00:48

I expected the following two tuples

>>> x = tuple(set([1, \"a\", \"b\", \"c\", \"z\", \"f\"]))
>>> y = tuple(set([\"a\", \"b\", \"c\", \"z\", \         


        
4条回答
  •  灰色年华
    2021-02-05 01:28

    At first glance, it appears that x should always equal y, because two sets constructed from the same elements are always equal:

    >>> x = set([1, "a", "b", "c", "z", "f"])
    >>> y = set(["a", "b", "c", "z", "f", 1])
    >>> x
    {1, 'z', 'a', 'b', 'c', 'f'}
    >>> y
    {1, 'z', 'a', 'b', 'c', 'f'}
    >>> x == y
    True
    

    However, it is not always the case that tuples (or other ordered collections) constructed from two equal sets are equal.

    In fact, the result of your comparison is sometimes True and sometimes False, at least in Python >= 3.3. Testing the following code:

    # compare.py
    x = tuple(set([1, "a", "b", "c", "z", "f"]))
    y = tuple(set(["a", "b", "c", "z", "f", 1]))
    print(x == y)
    

    ... a thousand times:

    $ for x in {1..1000}
    > do
    >   python3.3 compare.py
    > done | sort | uniq -c
    147 False
    853 True
    

    This is because, since Python 3.3, the hash values of strings, bytes and datetimes are randomized as a result of a security fix. Depending on what the hashes are, "collisions" may occur, which will mean that the order items are stored in the underlying array (and therefore the iteration order) depends on the insertion order.

    Here's the relevant bit from the docs:

    Security improvements:

    • Hash randomization is switched on by default.

    — https://docs.python.org/3/whatsnew/3.3.html

    EDIT: Since it's mentioned in the comments that the True/False ratio above is superficially surprising ...

    Sets, like dictionaries, are implemented as hash tables - so if there's a collision, the order of items in the table (and so the order of iteration) will depend both on which item was added first (different in x and y in this case) and the seed used for hashing (different across Python invocations since 3.3). Since collisions are rare by design, and the examples in this question are smallish sets, the issue doesn't arise as often as one might initially suppose.

    For a thorough explanation of Python's implementation of dictionaries and sets, see The Mighty Dictionary.

提交回复
热议问题