Why is 'x' in ('x',) faster than 'x' == 'x'?

后端 未结 2 1699
误落风尘
误落风尘 2021-01-29 18:15
>>> timeit.timeit(\"\'x\' in (\'x\',)\")
0.04869917374131205
>>> timeit.timeit(\"\'x\' == \'x\'\")
0.06144205736110564

Also works for

2条回答
  •  余生分开走
    2021-01-29 19:09

    As I mentioned to David Wolever, there's more to this than meets the eye; both methods dispatch to is; you can prove this by doing

    min(Timer("x == x", setup="x = 'a' * 1000000").repeat(10, 10000))
    #>>> 0.00045456900261342525
    
    min(Timer("x == y", setup="x = 'a' * 1000000; y = 'a' * 1000000").repeat(10, 10000))
    #>>> 0.5256857610074803
    

    The first can only be so fast because it checks by identity.

    To find out why one would take longer than the other, let's trace through execution.

    They both start in ceval.c, from COMPARE_OP since that is the bytecode involved

    TARGET(COMPARE_OP) {
        PyObject *right = POP();
        PyObject *left = TOP();
        PyObject *res = cmp_outcome(oparg, left, right);
        Py_DECREF(left);
        Py_DECREF(right);
        SET_TOP(res);
        if (res == NULL)
            goto error;
        PREDICT(POP_JUMP_IF_FALSE);
        PREDICT(POP_JUMP_IF_TRUE);
        DISPATCH();
    }
    

    This pops the values from the stack (technically it only pops one)

    PyObject *right = POP();
    PyObject *left = TOP();
    

    and runs the compare:

    PyObject *res = cmp_outcome(oparg, left, right);
    

    cmp_outcome is this:

    static PyObject *
    cmp_outcome(int op, PyObject *v, PyObject *w)
    {
        int res = 0;
        switch (op) {
        case PyCmp_IS: ...
        case PyCmp_IS_NOT: ...
        case PyCmp_IN:
            res = PySequence_Contains(w, v);
            if (res < 0)
                return NULL;
            break;
        case PyCmp_NOT_IN: ...
        case PyCmp_EXC_MATCH: ...
        default:
            return PyObject_RichCompare(v, w, op);
        }
        v = res ? Py_True : Py_False;
        Py_INCREF(v);
        return v;
    }
    

    This is where the paths split. The PyCmp_IN branch does

    int
    PySequence_Contains(PyObject *seq, PyObject *ob)
    {
        Py_ssize_t result;
        PySequenceMethods *sqm = seq->ob_type->tp_as_sequence;
        if (sqm != NULL && sqm->sq_contains != NULL)
            return (*sqm->sq_contains)(seq, ob);
        result = _PySequence_IterSearch(seq, ob, PY_ITERSEARCH_CONTAINS);
        return Py_SAFE_DOWNCAST(result, Py_ssize_t, int);
    }
    

    Note that a tuple is defined as

    static PySequenceMethods tuple_as_sequence = {
        ...
        (objobjproc)tuplecontains,                  /* sq_contains */
    };
    
    PyTypeObject PyTuple_Type = {
        ...
        &tuple_as_sequence,                         /* tp_as_sequence */
        ...
    };
    

    So the branch

    if (sqm != NULL && sqm->sq_contains != NULL)
    

    will be taken and *sqm->sq_contains, which is the function (objobjproc)tuplecontains, will be taken.

    This does

    static int
    tuplecontains(PyTupleObject *a, PyObject *el)
    {
        Py_ssize_t i;
        int cmp;
    
        for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
            cmp = PyObject_RichCompareBool(el, PyTuple_GET_ITEM(a, i),
                                               Py_EQ);
        return cmp;
    }
    

    ...Wait, wasn't that PyObject_RichCompareBool what the other branch took? Nope, that was PyObject_RichCompare.

    That code path was short so it likely just comes down to the speed of these two. Let's compare.

    int
    PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
    {
        PyObject *res;
        int ok;
    
        /* Quick result when objects are the same.
           Guarantees that identity implies equality. */
        if (v == w) {
            if (op == Py_EQ)
                return 1;
            else if (op == Py_NE)
                return 0;
        }
    
        ...
    }
    

    The code path in PyObject_RichCompareBool pretty much immediately terminates. For PyObject_RichCompare, it does

    PyObject *
    PyObject_RichCompare(PyObject *v, PyObject *w, int op)
    {
        PyObject *res;
    
        assert(Py_LT <= op && op <= Py_GE);
        if (v == NULL || w == NULL) { ... }
        if (Py_EnterRecursiveCall(" in comparison"))
            return NULL;
        res = do_richcompare(v, w, op);
        Py_LeaveRecursiveCall();
        return res;
    }
    

    The Py_EnterRecursiveCall/Py_LeaveRecursiveCall combo are not taken in the previous path, but these are relatively quick macros that'll short-circuit after incrementing and decrementing some globals.

    do_richcompare does:

    static PyObject *
    do_richcompare(PyObject *v, PyObject *w, int op)
    {
        richcmpfunc f;
        PyObject *res;
        int checked_reverse_op = 0;
    
        if (v->ob_type != w->ob_type && ...) { ... }
        if ((f = v->ob_type->tp_richcompare) != NULL) {
            res = (*f)(v, w, op);
            if (res != Py_NotImplemented)
                return res;
            ...
        }
        ...
    }
    

    This does some quick checks to call v->ob_type->tp_richcompare which is

    PyTypeObject PyUnicode_Type = {
        ...
        PyUnicode_RichCompare,      /* tp_richcompare */
        ...
    };
    

    which does

    PyObject *
    PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
    {
        int result;
        PyObject *v;
    
        if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
            Py_RETURN_NOTIMPLEMENTED;
    
        if (PyUnicode_READY(left) == -1 ||
            PyUnicode_READY(right) == -1)
            return NULL;
    
        if (left == right) {
            switch (op) {
            case Py_EQ:
            case Py_LE:
            case Py_GE:
                /* a string is equal to itself */
                v = Py_True;
                break;
            case Py_NE:
            case Py_LT:
            case Py_GT:
                v = Py_False;
                break;
            default:
                ...
            }
        }
        else if (...) { ... }
        else { ...}
        Py_INCREF(v);
        return v;
    }
    

    Namely, this shortcuts on left == right... but only after doing

        if (!PyUnicode_Check(left) || !PyUnicode_Check(right))
    
        if (PyUnicode_READY(left) == -1 ||
            PyUnicode_READY(right) == -1)
    

    All in all the paths then look something like this (manually recursively inlining, unrolling and pruning known branches)

    POP()                           # Stack stuff
    TOP()                           #
                                    #
    case PyCmp_IN:                  # Dispatch on operation
                                    #
    sqm != NULL                     # Dispatch to builtin op
    sqm->sq_contains != NULL        #
    *sqm->sq_contains               #
                                    #
    cmp == 0                        # Do comparison in loop
    i < Py_SIZE(a)                  #
    v == w                          #
    op == Py_EQ                     #
    ++i                             # 
    cmp == 0                        #
                                    #
    res < 0                         # Convert to Python-space
    res ? Py_True : Py_False        #
    Py_INCREF(v)                    #
                                    #
    Py_DECREF(left)                 # Stack stuff
    Py_DECREF(right)                #
    SET_TOP(res)                    #
    res == NULL                     #
    DISPATCH()                      #
    

    vs

    POP()                           # Stack stuff
    TOP()                           #
                                    #
    default:                        # Dispatch on operation
                                    #
    Py_LT <= op                     # Checking operation
    op <= Py_GE                     #
    v == NULL                       #
    w == NULL                       #
    Py_EnterRecursiveCall(...)      # Recursive check
                                    #
    v->ob_type != w->ob_type        # More operation checks
    f = v->ob_type->tp_richcompare  # Dispatch to builtin op
    f != NULL                       #
                                    #
    !PyUnicode_Check(left)          # ...More checks
    !PyUnicode_Check(right))        #
    PyUnicode_READY(left) == -1     #
    PyUnicode_READY(right) == -1    #
    left == right                   # Finally, doing comparison
    case Py_EQ:                     # Immediately short circuit
    Py_INCREF(v);                   #
                                    #
    res != Py_NotImplemented        #
                                    #
    Py_LeaveRecursiveCall()         # Recursive check
                                    #
    Py_DECREF(left)                 # Stack stuff
    Py_DECREF(right)                #
    SET_TOP(res)                    #
    res == NULL                     #
    DISPATCH()                      #
    

    Now, PyUnicode_Check and PyUnicode_READY are pretty cheap since they only check a couple of fields, but it should be obvious that the top one is a smaller code path, it has fewer function calls, only one switch statement and is just a bit thinner.

    TL;DR:

    Both dispatch to if (left_pointer == right_pointer); the difference is just how much work they do to get there. in just does less.

提交回复
热议问题