Python-Integer-objects in the range [1,2^30)
need 28
byte, as provided by sys.getsizeof()
and explained for example in this SO-post.>
@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint
-implementation. With his explanation, the memory footprint for
...
lst[i] = (1<<30)+i
should still be 40.52
, because sys.sizeof(1<<30)
is 32
, but the measurements show it to be 48.56
. On the other hand, for
...
lst[i] = (1<<60)+i
the footprint is still 48.56
, despite the fact, that sys.sizeof(1<<60)
is 36
.
The reason: the sys.getsizeof()
doesn't tell us the real memory footprint for the result of a summation, i.e. a+b
which is
1000+i
(1<<30)+i
(1<<60)+i
That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a
and b
:
static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
PyLongObject *z;
...
/* Ensure a is the larger of the two: */
...
z = _PyLong_New(size_a+1);
...
after the addition the result is normalized:
...
return long_normalize(z);
};
i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.
Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i
is 48.56
and not 44.xy
: The used py_malloc
-allocator uses memory-blocks with alignment of 8
bytes, that means 36
bytes will be stored in a block of size 40
- the same as the result of (1<<60)+i
(keep the additional 8-bytes for pointers in mind).
To explain the remaining 0.5
bytes we need to dive deeper into details of py_malloc
-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.
In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool
is touched. A pool is 4Kb
big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32
byte. That means the resolution of peak_used_memory
is 4Kb and cannot be responsible for those 0.5
bytes.
However, these pools must be managed, which leads to an additional overhead: py_malloc
needs a pool_header per pool:
/* Pool for small blocks. */
struct pool_header {
union { block *_padding;
uint count; } ref; /* number of allocated blocks */
block *freeblock; /* pool's free list head */
struct pool_header *nextpool; /* next pool of this size class */
struct pool_header *prevpool; /* previous pool "" */
uint arenaindex; /* index into arenas of base adr */
uint szidx; /* block size class index */
uint nextoffset; /* bytes to virgin block */
uint maxnextoffset; /* largest valid nextoffset */
};
The size of this struct is 48
(called POOL_OVERHEAD
) bytes on my Linux_64 machine. This pool_header
is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32
-byte-blocks, that means a pool has place for 126 32 byte integers:
/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))
Which leads to:
4Kb/126 = 32.51
bytes footprint for 1000+i
, plus additional 8 bytes for the pointer.(30<<1)+i
needs 40
bytes, that means 4Kb
has place for 102
blocks, of which one (there are remaining 16
bytes when pool is divided in 40
-bytes block, and they can be used for the pool_header
) is used forpool_header
, which leads to 4Kb/101=40.55
bytes (plus 8
byte pointer).We can also see, that there are some additional overhead, responsible for ca. 0.01
byte per integer - not big enough for me to care.