Why do dict keys support list subtraction but not tuple subtraction?

后端 未结 1 555
一个人的身影
一个人的身影 2021-01-07 23:34

Presumably dict_keys are supposed to behave as a set-like object, but they are lacking the difference method and the subtraction behaviour seems to diverge.

相关标签:
1条回答
  • 2021-01-08 00:00

    This looks to be a bug. The implementation is to convert the dict_keys to a set, then call .difference_update(arg) on it.

    It looks like they misused _PyObject_CallMethodId (an optimized variant of PyObject_CallMethod), by passing a format string of just "O". Thing is, PyObject_CallMethod and friends are documented to require a Py_BuildValue format string that "should produce a tuple". With more than one format code, it wraps the values in a tuple automatically, but with only one format code, it doesn't tuple, it just creates the value (in this case, because it's already PyObject*, all it does is increment the reference count).

    While I haven't tracked down where it might be doing this, I suspect somewhere in the internals it's identifying CallMethod calls that don't produce a tuple and wrapping them to make a one element tuple so the called function can actually receive the arguments in the expected format. When subtracting a tuple, it's already a tuple, and this fix up code never activates; when passing a list, it does, becoming a one element tuple containing the list.

    difference_update takes varargs (as if it were declared def difference_update(self, *args)). So when it receives the unwrapped tuple, it thinks it's supposed to subtract away the elements from each entry in the tuple, not treat said entries as values to subtract away themselves. To illustrate, when you do:

    mydict.keys() - (1, 2)
    

    the bug is causing it to do (roughly):

    result = set(mydict)
    # We've got a tuple to pass, so all's well...
    result.difference_update(*(1, 2)) # Unpack behaves like difference_update(1, 2)
    # OH NO!
    

    While:

    mydict.keys() - [1, 2]
    

    does:

    result = set(mydict)
    # [1, 2] isn't a tuple, so wrap
    result.difference_update(*([1, 2],)) # Behaves like difference_update([1, 2])
    # All's well
    

    That's why a tuple of str works (incorrectly), - ('abc', '123') is performing a call equivalent to:

    result.difference_update(*('abc', '123'))
    # or without unpacking:
    result.difference_update('abc', '123')
    

    and since strs are iterables of their characters, it just blithely removes entries for 'a', 'b', 'c', etc. instead of 'abc' and '123' like you expected.

    Basically, this is a bug; it's filed against the CPython folks and fixed in 3.6.0 (as well as later releases of 2.7, 3.4, and 3.5).

    The correct behavior probably should have been to call (assuming this Id variant exists for this API):

    _PyObject_CallMethodObjArgsId(result, &PyId_difference_update, other, NULL);
    

    which wouldn't have the packing issues at all, and would run faster to boot; the smallest change would be to change the format string to "(O)" to force tuple creation even for a single item, but since the format string gains nothing, _PyObject_CallMethodObjArgsId is better.

    0 讨论(0)
提交回复
热议问题