Why are __getitem__(key) and get(key) significantly slower than [key]?

后端 未结 1 1967
失恋的感觉
失恋的感觉 2021-02-13 19:30

It was my understanding that brackets were nothing more than a wrapper for __getitem__. Here is how I benchmarked this:

First, I generated a semi-large dic

相关标签:
1条回答
  • 2021-02-13 20:05

    First, the disassembly posted by Not_a_Golfer:

    >>> d = {1:2}
    >>> dis.dis(lambda: d[1])
      1           0 LOAD_GLOBAL              0 (d)
                  3 LOAD_CONST               1 (1)
                  6 BINARY_SUBSCR       
                  7 RETURN_VALUE   
    
    >>> dis.dis(lambda: d.get(1))
      1           0 LOAD_GLOBAL              0 (d)
                  3 LOAD_ATTR                1 (get)
                  6 LOAD_CONST               1 (1)
                  9 CALL_FUNCTION            1
                 12 RETURN_VALUE  
    
    >>> dis.dis(lambda: d.__getitem__(1))
      1           0 LOAD_GLOBAL              0 (d)
                  3 LOAD_ATTR                1 (__getitem__)
                  6 LOAD_CONST               1 (1)
                  9 CALL_FUNCTION            1
                 12 RETURN_VALUE
    

    Now, getting the benchmarking right is obviously important to read anything into the results, and I don't know enough to help much there. But assuming there really is a difference (which makes sense to me), here's my guesses about why there is:

    1. dict.get simply "does more"; it has to check if the key is present, and if not return its second argument (which defaults to None). This means there's some form of conditional or exception-catching, so I am completely unsurprised that this would have different timing characteristics to the more basic operation of retrieving the value associated with a key.

    2. Python has a specific bytecode for the "subscription" operation (as demonstrated in the disassembly). The builtin types, including dict, are implemented primarily in C and their implementations do not necessarily play by the normal Python rules (only their interfaces are required to, and there are plenty of corner cases even there). So my guess would be that the implementation of the BINARY_SUBSCR opcode goes more-or-less directly to the underlying C implementations of builtin types that support this operation. For these types, I expect that it is actually __getitem__ that exists as a Python-level method to wrap the C implementation, rather than that the bracket syntax invokes the Python-level method.

    It might be interesting to benchmark thing.__getitem__(key) against thing[key] for an instance of a custom class that implements __getitem__; you might actually see the opposite results there as the BINARY_SUBSCR op-code would internally have to fall back to doing equivalent work to looking up the method and invoking it.

    0 讨论(0)
提交回复
热议问题