I\'ve been using Python for a while now to solve practical problems, but I still don\'t have a proper theoretical understanding of what\'s going on behind the hood. For example,
A function object's data is divided into two primary parts. The parts that would be the same for all functions created by the same function definition are stored in the function's code object, while the parts that can change even between functions created from the same function definition are stored in the function object.
The most interesting part of a function is probably its bytecode. This is the core data structure that says what to actually do to execute a function. It's stored as a bytestring in the function's code object, and you can examine it directly:
>>> def fib(i):
... x, y = 0, 1
... for _ in range(i):
... x, y = y, x+y
... return x
...
>>> fib.__code__.co_code
b'd\x03\\\x02}\x01}\x02x\x1et\x00|\x00\x83\x01D\x00]\x12}\x03|\x02|\x01|\x02\x17\x00\x02\x00}\x01}\x02q\x1
2W\x00|\x01S\x00'
...but it's not designed to be human-readable.
With enough knowledge of the implementation details of Python bytecode, you could parse that yourself, but describing all that would take way too long. Instead, we'll use the dis module to disassemble the bytecode for us:
>>> import dis
>>> dis.dis(fib)
2 0 LOAD_CONST 3 ((0, 1))
2 UNPACK_SEQUENCE 2
4 STORE_FAST 1 (x)
6 STORE_FAST 2 (y)
3 8 SETUP_LOOP 30 (to 40)
10 LOAD_GLOBAL 0 (range)
12 LOAD_FAST 0 (i)
14 CALL_FUNCTION 1
16 GET_ITER
>> 18 FOR_ITER 18 (to 38)
20 STORE_FAST 3 (_)
4 22 LOAD_FAST 2 (y)
24 LOAD_FAST 1 (x)
26 LOAD_FAST 2 (y)
28 BINARY_ADD
30 ROT_TWO
32 STORE_FAST 1 (x)
34 STORE_FAST 2 (y)
36 JUMP_ABSOLUTE 18
>> 38 POP_BLOCK
5 >> 40 LOAD_FAST 1 (x)
42 RETURN_VALUE
There are a number of columns in the output here, but we're mostly interested in the one with the ALL_CAPS and the columns to the right of that.
The ALL_CAPS column shows the function's bytecode instructions. For example, LOAD_CONST
loads a constant value, and BINARY_ADD
is the instruction to add two objects with +
. The next column, with the numbers, is for bytecode arguments. For example, LOAD_CONST 3
says to load the constant at index 3 in the code object's constants. These are always integers, and they're packed into the bytecode string along with the bytecode instructions. The last column mostly provides human-readable explanations of the bytecode arguments, for example, saying that the 3 in LOAD_CONST 3
corresponds to the constant (0, 1)
, or that the 1
in STORE_FAST 1
corresponds to local variable x
. The information in this column doesn't actually come from the bytecode string; it's resolved by examining other parts of the code object.
The rest of a function object's data is primarily stuff needed to resolve bytecode arguments, like the function's closure or its global variable dict, and stuff that just exists because it's handy for introspection, like the function's __name__
.
If we take a look at the Python 3.6 function object struct definition at C level:
typedef struct {
PyObject_HEAD
PyObject *func_code; /* A code object, the __code__ attribute */
PyObject *func_globals; /* A dictionary (other mappings won't do) */
PyObject *func_defaults; /* NULL or a tuple */
PyObject *func_kwdefaults; /* NULL or a dict */
PyObject *func_closure; /* NULL or a tuple of cell objects */
PyObject *func_doc; /* The __doc__ attribute, can be anything */
PyObject *func_name; /* The __name__ attribute, a string object */
PyObject *func_dict; /* The __dict__ attribute, a dict or NULL */
PyObject *func_weakreflist; /* List of weak references */
PyObject *func_module; /* The __module__ attribute, can be anything */
PyObject *func_annotations; /* Annotations, a dict or NULL */
PyObject *func_qualname; /* The qualified name */
/* Invariant:
* func_closure contains the bindings for func_code->co_freevars, so
* PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
* (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
*/
} PyFunctionObject;
we can see that there's the code object, and then
__dict__
,__module__
,__qualname__
, the fully qualified nameInside the PyObject_HEAD
macro, there's also the type pointer and some refcount/GC metadata.
We didn't have to go straight to C to examine most of that - we could have looked at the dir
and filtered out non-instance attributes, since most of that data is available at Python level - but the struct definition provides a nice, commented, uncluttered list.
You can examine the code object struct definition too, but the contents aren't as clear if you're not already familiar with code objects, so I'm not going to embed it in the post. I'll just explain code objects.
The core component of a code object is a bytestring of Python bytecode instructions and arguments. We examined one of those earlier. In addition, the code object contains things like a tuple of the constants the function refers to, and a lot of other internal metadata required to figure out how to actually execute each instruction. Not all the metadata - some of it comes from the function object - but a lot of it. Some of it, like that tuple of constants, is fairly easily understandable, and some of it, like co_flags
(a bunch of internal flags) or co_stacksize
(the size of the stack used for temporary values) is more esoteric.