Presumably dict_keys are supposed to behave as a set-like object, but they are lacking the difference
method and the subtraction behaviour seems to diverge.
This looks to be a bug. The implementation is to convert the dict_keys to a set, then call .difference_update(arg) on it.
It looks like they misused _PyObject_CallMethodId
(an optimized variant of PyObject_CallMethod
), by passing a format string of just "O"
. Thing is, PyObject_CallMethod and friends are documented to require a Py_BuildValue format string that "should produce a tuple". With more than one format code, it wraps the values in a tuple
automatically, but with only one format code, it doesn't tuple
, it just creates the value (in this case, because it's already PyObject*
, all it does is increment the reference count).
While I haven't tracked down where it might be doing this, I suspect somewhere in the internals it's identifying CallMethod
calls that don't produce a tuple
and wrapping them to make a one element tuple
so the called function can actually receive the arguments in the expected format. When subtracting a tuple
, it's already a tuple
, and this fix up code never activates; when passing a list
, it does, becoming a one element tuple
containing the list
.
difference_update
takes varargs (as if it were declared def difference_update(self, *args)
). So when it receives the unwrapped tuple
, it thinks it's supposed to subtract away the elements from each entry in the tuple
, not treat said entries as values to subtract away themselves. To illustrate, when you do:
mydict.keys() - (1, 2)
the bug is causing it to do (roughly):
result = set(mydict)
# We've got a tuple to pass, so all's well...
result.difference_update(*(1, 2)) # Unpack behaves like difference_update(1, 2)
# OH NO!
While:
mydict.keys() - [1, 2]
does:
result = set(mydict)
# [1, 2] isn't a tuple, so wrap
result.difference_update(*([1, 2],)) # Behaves like difference_update([1, 2])
# All's well
That's why a tuple
of str
works (incorrectly), - ('abc', '123')
is performing a call equivalent to:
result.difference_update(*('abc', '123'))
# or without unpacking:
result.difference_update('abc', '123')
and since str
s are iterables of their characters, it just blithely removes entries for 'a'
, 'b'
, 'c'
, etc. instead of 'abc'
and '123'
like you expected.
Basically, this is a bug; it's filed against the CPython folks and fixed in 3.6.0 (as well as later releases of 2.7, 3.4, and 3.5).
The correct behavior probably should have been to call (assuming this Id
variant exists for this API):
_PyObject_CallMethodObjArgsId(result, &PyId_difference_update, other, NULL);
which wouldn't have the packing issues at all, and would run faster to boot; the smallest change would be to change the format string to "(O)"
to force tuple
creation even for a single item, but since the format string gains nothing, _PyObject_CallMethodObjArgsId
is better.