问题
Python-Integer-objects in the range [1,2^30)
need 28
byte, as provided by sys.getsizeof()
and explained for example in this SO-post.
However, when I measure the memory footprint with the following script:
#int_list.py:
import sys
N=int(sys.argv[1])
lst=[0]*N # no overallocation
for i in range(N):
lst[i]=1000+i # ints not from integer pool
via
/usr/bin/time -fpeak_used_memory:%M python3 int_list.py <N>
I get the following peak memory values (Linux-x64, Python 3.6.2):
N Peak memory in Kb bytes/integer
-------------------------------------------
1 9220
1e7 404712 40.50
2e7 800612 40.52
3e7 1196204 40.52
4e7 1591948 40.52
So it looks as if 40.5
bytes are needed per one integer object, i.e. 12.5
bytes more than yielded by sys.getsizeof()
.
Additional 8
bytes are easy to explain - the list lst
doesn't hold the integer objects, but references to them - that means an additonal pointer, i.e. 8
bytes, are needed.
However, what about the other 4.5
bytes, what are they used for?
The following causes can be ruled out:
- size of integer-objects are variable, but
10^7
is smaller than2^30
and thus all integer will be28
bytes large. - There is no overallocation in the list
lst
, which can be easily checked viasys.getsizeof(lst)
which yields8
times the number of elements, plus a very small overhead.
回答1:
The int
object only requires 28 bytes, but Python uses 8-byte alignment: memory is allocated in blocks that are multiples of 8 bytes in size. So the actual memory used by each int
object is 32 bytes. See this excellent article on Python memory management for more details.
I don't yet have an explanation for the remaining half byte, but I'll update this if I find one.
回答2:
@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint
-implementation. With his explanation, the memory footprint for
...
lst[i] = (1<<30)+i
should still be 40.52
, because sys.sizeof(1<<30)
is 32
, but the measurements show it to be 48.56
. On the other hand, for
...
lst[i] = (1<<60)+i
the footprint is still 48.56
, despite the fact, that sys.sizeof(1<<60)
is 36
.
The reason: the sys.getsizeof()
doesn't tell us the real memory footprint for the result of a summation, i.e. a+b
which is
- 32 bytes for
1000+i
- 36 bytes for
(1<<30)+i
- 40 bytes for
(1<<60)+i
That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a
and b
:
static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
PyLongObject *z;
...
/* Ensure a is the larger of the two: */
...
z = _PyLong_New(size_a+1);
...
after the addition the result is normalized:
...
return long_normalize(z);
};
i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.
Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i
is 48.56
and not 44.xy
: The used py_malloc
-allocator uses memory-blocks with alignment of 8
bytes, that means 36
bytes will be stored in a block of size 40
- the same as the result of (1<<60)+i
(keep the additional 8-bytes for pointers in mind).
To explain the remaining 0.5
bytes we need to dive deeper into details of py_malloc
-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.
In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool
is touched. A pool is 4Kb
big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32
byte. That means the resolution of peak_used_memory
is 4Kb and cannot be responsible for those 0.5
bytes.
However, these pools must be managed, which leads to an additional overhead: py_malloc
needs a pool_header per pool:
/* Pool for small blocks. */
struct pool_header {
union { block *_padding;
uint count; } ref; /* number of allocated blocks */
block *freeblock; /* pool's free list head */
struct pool_header *nextpool; /* next pool of this size class */
struct pool_header *prevpool; /* previous pool "" */
uint arenaindex; /* index into arenas of base adr */
uint szidx; /* block size class index */
uint nextoffset; /* bytes to virgin block */
uint maxnextoffset; /* largest valid nextoffset */
};
The size of this struct is 48
(called POOL_OVERHEAD
) bytes on my Linux_64 machine. This pool_header
is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32
-byte-blocks, that means a pool has place for 126 32 byte integers:
/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))
Which leads to:
4Kb/126 = 32.51
bytes footprint for1000+i
, plus additional 8 bytes for the pointer.(30<<1)+i
needs40
bytes, that means4Kb
has place for102
blocks, of which one (there are remaining16
bytes when pool is divided in40
-bytes block, and they can be used for thepool_header
) is used forpool_header
, which leads to4Kb/101=40.55
bytes (plus8
byte pointer).
We can also see, that there are some additional overhead, responsible for ca. 0.01
byte per integer - not big enough for me to care.
来源:https://stackoverflow.com/questions/55595549/large-memory-footprint-of-integers-compared-with-result-of-sys-getsizeof