Large memory footprint of integers compared with result of sys.getsizeof()

后端未结

关注

 2  425

面向向阳花 2021-01-13 13:04

Python-Integer-objects in the range [1,2^30) need 28 byte, as provided by sys.getsizeof() and explained for example in this SO-post.

2条回答

有刺的猬 (楼主)

2021-01-13 13:35
@Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint-implementation. With his explanation, the memory footprint for
```
...
lst[i] = (1<<30)+i
```
should still be 40.52, because sys.sizeof(1<<30) is 32, but the measurements show it to be 48.56. On the other hand, for
```
...
lst[i] = (1<<60)+i
```
the footprint is still 48.56, despite the fact, that sys.sizeof(1<<60) is 36.

The reason: the sys.getsizeof() doesn't tell us the real memory footprint for the result of a summation, i.e. a+b which is
- 32 bytes for 1000+i
- 36 bytes for (1<<30)+i
- 40 bytes for (1<<60)+i
That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a and b:
```
static PyLongObject *
x_add(PyLongObject *a, PyLongObject *b)
{
    Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
    PyLongObject *z;
    ...
    /* Ensure a is the larger of the two: */
    ...
    z = _PyLong_New(size_a+1);  
    ...
```
after the addition the result is normalized:
```
 ...
 return long_normalize(z);
```
};

i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.

Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i is 48.56 and not 44.xy: The used py_malloc-allocator uses memory-blocks with alignment of 8 bytes, that means 36 bytes will be stored in a block of size 40 - the same as the result of (1<<60)+i (keep the additional 8-bytes for pointers in mind).

To explain the remaining 0.5 bytes we need to dive deeper into details of py_malloc-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.

In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool is touched. A pool is 4Kb big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32 byte. That means the resolution of peak_used_memory is 4Kb and cannot be responsible for those 0.5 bytes.

However, these pools must be managed, which leads to an additional overhead: py_malloc needs a pool_header per pool:
```
/* Pool for small blocks. */
struct pool_header {
    union { block *_padding;
            uint count; } ref;          /* number of allocated blocks    */
    block *freeblock;                   /* pool's free list head         */
    struct pool_header *nextpool;       /* next pool of this size class  */
    struct pool_header *prevpool;       /* previous pool       ""        */
    uint arenaindex;                    /* index into arenas of base adr */
    uint szidx;                         /* block size class index        */
    uint nextoffset;                    /* bytes to virgin block         */
    uint maxnextoffset;                 /* largest valid nextoffset      */
};
```
The size of this struct is 48 (called POOL_OVERHEAD) bytes on my Linux_64 machine. This pool_header is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32-byte-blocks, that means a pool has place for 126 32 byte integers:
```
/* Return total number of blocks in pool of size index I, as a uint. */
#define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))
```
Which leads to:
- 4Kb/126 = 32.51 bytes footprint for 1000+i, plus additional 8 bytes for the pointer.
- (30<<1)+i needs 40 bytes, that means 4Kb has place for 102 blocks, of which one (there are remaining 16 bytes when pool is divided in 40-bytes block, and they can be used for the pool_header) is used forpool_header, which leads to 4Kb/101=40.55 bytes (plus 8 byte pointer).
We can also see, that there are some additional overhead, responsible for ca. 0.01 byte per integer - not big enough for me to care.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...