Functions as objects in Python: what exactly is stored in memory?

前端 未结 2 1327
我寻月下人不归
我寻月下人不归 2021-02-08 02:33

I\'ve been using Python for a while now to solve practical problems, but I still don\'t have a proper theoretical understanding of what\'s going on behind the hood. For example,

2条回答
  •  悲&欢浪女
    2021-02-08 02:45

    A function object's data is divided into two primary parts. The parts that would be the same for all functions created by the same function definition are stored in the function's code object, while the parts that can change even between functions created from the same function definition are stored in the function object.

    The most interesting part of a function is probably its bytecode. This is the core data structure that says what to actually do to execute a function. It's stored as a bytestring in the function's code object, and you can examine it directly:

    >>> def fib(i):
    ...     x, y = 0, 1
    ...     for _ in range(i):
    ...         x, y = y, x+y
    ...     return x
    ... 
    >>> fib.__code__.co_code
    b'd\x03\\\x02}\x01}\x02x\x1et\x00|\x00\x83\x01D\x00]\x12}\x03|\x02|\x01|\x02\x17\x00\x02\x00}\x01}\x02q\x1
    2W\x00|\x01S\x00'
    

    ...but it's not designed to be human-readable.

    With enough knowledge of the implementation details of Python bytecode, you could parse that yourself, but describing all that would take way too long. Instead, we'll use the dis module to disassemble the bytecode for us:

    >>> import dis
    >>> dis.dis(fib)
      2           0 LOAD_CONST               3 ((0, 1))
                  2 UNPACK_SEQUENCE          2
                  4 STORE_FAST               1 (x)
                  6 STORE_FAST               2 (y)
    
      3           8 SETUP_LOOP              30 (to 40)
                 10 LOAD_GLOBAL              0 (range)
                 12 LOAD_FAST                0 (i)
                 14 CALL_FUNCTION            1
                 16 GET_ITER
            >>   18 FOR_ITER                18 (to 38)
                 20 STORE_FAST               3 (_)
      4          22 LOAD_FAST                2 (y)
                 24 LOAD_FAST                1 (x)
                 26 LOAD_FAST                2 (y)
                 28 BINARY_ADD
                 30 ROT_TWO
                 32 STORE_FAST               1 (x)
                 34 STORE_FAST               2 (y)
                 36 JUMP_ABSOLUTE           18
            >>   38 POP_BLOCK
      5     >>   40 LOAD_FAST                1 (x)
                 42 RETURN_VALUE
    

    There are a number of columns in the output here, but we're mostly interested in the one with the ALL_CAPS and the columns to the right of that.

    The ALL_CAPS column shows the function's bytecode instructions. For example, LOAD_CONST loads a constant value, and BINARY_ADD is the instruction to add two objects with +. The next column, with the numbers, is for bytecode arguments. For example, LOAD_CONST 3 says to load the constant at index 3 in the code object's constants. These are always integers, and they're packed into the bytecode string along with the bytecode instructions. The last column mostly provides human-readable explanations of the bytecode arguments, for example, saying that the 3 in LOAD_CONST 3 corresponds to the constant (0, 1), or that the 1 in STORE_FAST 1 corresponds to local variable x. The information in this column doesn't actually come from the bytecode string; it's resolved by examining other parts of the code object.


    The rest of a function object's data is primarily stuff needed to resolve bytecode arguments, like the function's closure or its global variable dict, and stuff that just exists because it's handy for introspection, like the function's __name__.

    If we take a look at the Python 3.6 function object struct definition at C level:

    typedef struct {
        PyObject_HEAD
        PyObject *func_code;    /* A code object, the __code__ attribute */
        PyObject *func_globals; /* A dictionary (other mappings won't do) */
        PyObject *func_defaults;    /* NULL or a tuple */
        PyObject *func_kwdefaults;  /* NULL or a dict */
        PyObject *func_closure; /* NULL or a tuple of cell objects */
        PyObject *func_doc;     /* The __doc__ attribute, can be anything */
        PyObject *func_name;    /* The __name__ attribute, a string object */
        PyObject *func_dict;    /* The __dict__ attribute, a dict or NULL */
        PyObject *func_weakreflist; /* List of weak references */
        PyObject *func_module;  /* The __module__ attribute, can be anything */
        PyObject *func_annotations; /* Annotations, a dict or NULL */
        PyObject *func_qualname;    /* The qualified name */
    
        /* Invariant:
         *     func_closure contains the bindings for func_code->co_freevars, so
         *     PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
         *     (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
         */
    } PyFunctionObject;
    

    we can see that there's the code object, and then

    • the global variable dict,
    • the default argument values,
    • the keyword-only argument default values,
    • the function's closure cells,
    • the docstring,
    • the name,
    • the __dict__,
    • the list of weak references to the function,
    • the __module__,
    • the annotations, and
    • the __qualname__, the fully qualified name

    Inside the PyObject_HEAD macro, there's also the type pointer and some refcount/GC metadata.

    We didn't have to go straight to C to examine most of that - we could have looked at the dir and filtered out non-instance attributes, since most of that data is available at Python level - but the struct definition provides a nice, commented, uncluttered list.

    You can examine the code object struct definition too, but the contents aren't as clear if you're not already familiar with code objects, so I'm not going to embed it in the post. I'll just explain code objects.

    The core component of a code object is a bytestring of Python bytecode instructions and arguments. We examined one of those earlier. In addition, the code object contains things like a tuple of the constants the function refers to, and a lot of other internal metadata required to figure out how to actually execute each instruction. Not all the metadata - some of it comes from the function object - but a lot of it. Some of it, like that tuple of constants, is fairly easily understandable, and some of it, like co_flags (a bunch of internal flags) or co_stacksize (the size of the stack used for temporary values) is more esoteric.

提交回复
热议问题