Functions as objects in Python: what exactly is stored in memory?

前端 未结 2 1329
我寻月下人不归
我寻月下人不归 2021-02-08 02:33

I\'ve been using Python for a while now to solve practical problems, but I still don\'t have a proper theoretical understanding of what\'s going on behind the hood. For example,

相关标签:
2条回答
  • 2021-02-08 02:45

    A function object's data is divided into two primary parts. The parts that would be the same for all functions created by the same function definition are stored in the function's code object, while the parts that can change even between functions created from the same function definition are stored in the function object.

    The most interesting part of a function is probably its bytecode. This is the core data structure that says what to actually do to execute a function. It's stored as a bytestring in the function's code object, and you can examine it directly:

    >>> def fib(i):
    ...     x, y = 0, 1
    ...     for _ in range(i):
    ...         x, y = y, x+y
    ...     return x
    ... 
    >>> fib.__code__.co_code
    b'd\x03\\\x02}\x01}\x02x\x1et\x00|\x00\x83\x01D\x00]\x12}\x03|\x02|\x01|\x02\x17\x00\x02\x00}\x01}\x02q\x1
    2W\x00|\x01S\x00'
    

    ...but it's not designed to be human-readable.

    With enough knowledge of the implementation details of Python bytecode, you could parse that yourself, but describing all that would take way too long. Instead, we'll use the dis module to disassemble the bytecode for us:

    >>> import dis
    >>> dis.dis(fib)
      2           0 LOAD_CONST               3 ((0, 1))
                  2 UNPACK_SEQUENCE          2
                  4 STORE_FAST               1 (x)
                  6 STORE_FAST               2 (y)
    
      3           8 SETUP_LOOP              30 (to 40)
                 10 LOAD_GLOBAL              0 (range)
                 12 LOAD_FAST                0 (i)
                 14 CALL_FUNCTION            1
                 16 GET_ITER
            >>   18 FOR_ITER                18 (to 38)
                 20 STORE_FAST               3 (_)
      4          22 LOAD_FAST                2 (y)
                 24 LOAD_FAST                1 (x)
                 26 LOAD_FAST                2 (y)
                 28 BINARY_ADD
                 30 ROT_TWO
                 32 STORE_FAST               1 (x)
                 34 STORE_FAST               2 (y)
                 36 JUMP_ABSOLUTE           18
            >>   38 POP_BLOCK
      5     >>   40 LOAD_FAST                1 (x)
                 42 RETURN_VALUE
    

    There are a number of columns in the output here, but we're mostly interested in the one with the ALL_CAPS and the columns to the right of that.

    The ALL_CAPS column shows the function's bytecode instructions. For example, LOAD_CONST loads a constant value, and BINARY_ADD is the instruction to add two objects with +. The next column, with the numbers, is for bytecode arguments. For example, LOAD_CONST 3 says to load the constant at index 3 in the code object's constants. These are always integers, and they're packed into the bytecode string along with the bytecode instructions. The last column mostly provides human-readable explanations of the bytecode arguments, for example, saying that the 3 in LOAD_CONST 3 corresponds to the constant (0, 1), or that the 1 in STORE_FAST 1 corresponds to local variable x. The information in this column doesn't actually come from the bytecode string; it's resolved by examining other parts of the code object.


    The rest of a function object's data is primarily stuff needed to resolve bytecode arguments, like the function's closure or its global variable dict, and stuff that just exists because it's handy for introspection, like the function's __name__.

    If we take a look at the Python 3.6 function object struct definition at C level:

    typedef struct {
        PyObject_HEAD
        PyObject *func_code;    /* A code object, the __code__ attribute */
        PyObject *func_globals; /* A dictionary (other mappings won't do) */
        PyObject *func_defaults;    /* NULL or a tuple */
        PyObject *func_kwdefaults;  /* NULL or a dict */
        PyObject *func_closure; /* NULL or a tuple of cell objects */
        PyObject *func_doc;     /* The __doc__ attribute, can be anything */
        PyObject *func_name;    /* The __name__ attribute, a string object */
        PyObject *func_dict;    /* The __dict__ attribute, a dict or NULL */
        PyObject *func_weakreflist; /* List of weak references */
        PyObject *func_module;  /* The __module__ attribute, can be anything */
        PyObject *func_annotations; /* Annotations, a dict or NULL */
        PyObject *func_qualname;    /* The qualified name */
    
        /* Invariant:
         *     func_closure contains the bindings for func_code->co_freevars, so
         *     PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
         *     (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
         */
    } PyFunctionObject;
    

    we can see that there's the code object, and then

    • the global variable dict,
    • the default argument values,
    • the keyword-only argument default values,
    • the function's closure cells,
    • the docstring,
    • the name,
    • the __dict__,
    • the list of weak references to the function,
    • the __module__,
    • the annotations, and
    • the __qualname__, the fully qualified name

    Inside the PyObject_HEAD macro, there's also the type pointer and some refcount/GC metadata.

    We didn't have to go straight to C to examine most of that - we could have looked at the dir and filtered out non-instance attributes, since most of that data is available at Python level - but the struct definition provides a nice, commented, uncluttered list.

    You can examine the code object struct definition too, but the contents aren't as clear if you're not already familiar with code objects, so I'm not going to embed it in the post. I'll just explain code objects.

    The core component of a code object is a bytestring of Python bytecode instructions and arguments. We examined one of those earlier. In addition, the code object contains things like a tuple of the constants the function refers to, and a lot of other internal metadata required to figure out how to actually execute each instruction. Not all the metadata - some of it comes from the function object - but a lot of it. Some of it, like that tuple of constants, is fairly easily understandable, and some of it, like co_flags (a bunch of internal flags) or co_stacksize (the size of the stack used for temporary values) is more esoteric.

    0 讨论(0)
  • 2021-02-08 02:59

    Functions are objects just like any other: they are instances of a type (or class). You can get the type of a function using type(f), where f is a function, or use the types module (types.FunctionType).

    When you define a function, Python builds a function object and assigns a name to it. This machinery is hidden behind the def statement, but it works the same as the instantiation of any other type.

    Which means that in Python, function definitions are executed, unlike in some other languages. Among other things, this means that functions don't exist until the flow of code reaches them, so you can't call a function before it has been defined.

    The inspect module lets you snoop around inside various kinds of objects. This table in its documentation is useful for seeing what kinds of components functions and related types of objects (such as methods) are made from, and how to get to them.

    The actual code inside a function becomes a code object, which contains the byte code that is executed by the Python interpreter. You can see this using the dis module.

    Looking at the help() of the types for functions and code objects is interesting, as it shows what arguments you need to pass in to build these objects. It is possible to make new functions from raw byte code, to copy byte code from one function to another but use a different closure, and so on.

    help(type(lambda: 0))
    help(type((lambda: 0).__code__))
    

    You can also build code objects using the compile() function and then build functions out of them.

    Fun Fact

    Any object whose type has a __call__() method is callable. Functions are callable, and their type has a __call__() method. Which is callable. Which means it, too, has a __call__() method, which has a __call__() method, ad nauseam, ad infinitum.

    How does a function actually get called, then? Python actually bypasses __call__ for objects with __call__ implemented in C, such as a Python function's __call__ method. Indeed, (lambda: 0).__call__ is a method-wrapper, which is used to wrap a C function.

    0 讨论(0)
提交回复
热议问题