How and when does Python determine the data type of a variable?

后端 未结 3 863
执念已碎
执念已碎 2021-02-20 12:39

I was trying to figure out exactly how Python 3 (using CPython as an interpreter) executes its program. I found out that the steps are:

  1. Compilation of Python so

3条回答
  •  野性不改
    2021-02-20 13:14

    The dynamic, run-time dispatch of CPython (compared to static, compile-time dispatch of Java) is only one of the reasons, why Java is faster than pure CPython: there are jit-compilation in Java, different garbage collection strategies, presence of native types like int, double vs. immutable data structures in CPython and so on.

    My earlier superficial experiments have shown, that the dynamical dispatch is only responsible for about 30% of running - you cannot explain speed differences of some factors of magnitude with that.

    To make this answer less abstract, let's take a look at an example:

    def add(x,y):
       return x+y
    

    Looking at the bytecode:

    import dis
    dis.dis(add)
    

    which gives:

    2         0 LOAD_FAST                0 (x)
              2 LOAD_FAST                1 (y)
              4 BINARY_ADD
              6 RETURN_VALUE
    

    We can see on the level of bytecode there is no difference whether x and y are integers or floats or something else - the interpreter doesn't care.

    The situation is completely different in Java:

    int add(int x, int y) {return x+y;}
    

    and

    float add(float x, float y) {return x+y;}
    

    would result in completely different opcodes and the call-dispatch would happen at compile time - the right version is picked depending on the static types which are known at the compile time.

    Pretty often CPython-interpreter doesn't have to know the exact type of arguments: Internally there is a base "class/interface" (obviously there are no classes in C, so it is called "protocol", but for somebody who knows C++/Java "interface" is probably the right mental model), from which all other "classes" are derived. This base "class" is called PyObject and here is the description of its protocol.. So as long as the function is a part of this protocol/interface CPython interpreter can call it, without knowing the exact type and the call will be dispatched to the right implementation (a lot like "virtual" functions in C++).

    On the pure Python side, it seems as if variables don't have types:

    a=1
    a="1"
    

    however, internally a has a type - it is PyObject* and this reference can be bound to an integer (1) and to an unicode-string ("1") - because they both "inherit" from PyObject.

    From time to time the CPython interpreter tries to find out the right type of the reference, also for the above example - when it sees BINARY_ADD-opcode, the following C-code is executed:

        case TARGET(BINARY_ADD): {
            PyObject *right = POP();
            PyObject *left = TOP();
            PyObject *sum;
            ...
            if (PyUnicode_CheckExact(left) &&
                     PyUnicode_CheckExact(right)) {
                sum = unicode_concatenate(left, right, f, next_instr);
                /* unicode_concatenate consumed the ref to left */
            }
            else {
                sum = PyNumber_Add(left, right);
                Py_DECREF(left);
            }
            Py_DECREF(right);
            SET_TOP(sum);
            if (sum == NULL)
                goto error;
            DISPATCH();
        }
    

    Here the interpreter queries, whether both objects are unicode strings and if this is the case a special method (maybe more efficient, as matter of fact it tries to change the immutable unicode-object in-place, see this SO-answer) is used, otherwise the work is dispatched to PyNumber-protocol.

    Obviously, the interpreter also has to know the exact type when an object is created, for example for a="1" or a=1 different "classes" are used - but as we have seen it is not the only one place.

    So the interpreter interfers the types during the run-time, but most of the time it doesn't have to do it - the goal can be reached via dynamic dispatch.

提交回复
热议问题