Large memory footprint of integers compared with result of sys.getsizeof()

后端 未结 2 436
面向向阳花
面向向阳花 2021-01-13 13:04

Python-Integer-objects in the range [1,2^30) need 28 byte, as provided by sys.getsizeof() and explained for example in this SO-post.

2条回答
  •  有刺的猬
    2021-01-13 13:35

    @Nathan's suggestion is surprisingly not the solution, due to some subtly details of CPython's longint-implementation. With his explanation, the memory footprint for

    ...
    lst[i] = (1<<30)+i
    

    should still be 40.52, because sys.sizeof(1<<30) is 32, but the measurements show it to be 48.56. On the other hand, for

    ...
    lst[i] = (1<<60)+i
    

    the footprint is still 48.56, despite the fact, that sys.sizeof(1<<60) is 36.

    The reason: the sys.getsizeof() doesn't tell us the real memory footprint for the result of a summation, i.e. a+b which is

    • 32 bytes for 1000+i
    • 36 bytes for (1<<30)+i
    • 40 bytes for (1<<60)+i

    That happens because when two integers are added in x_add, the resulting integer has at first one "digit", i.e. 4 bytes, more than the maximum of a and b:

    static PyLongObject *
    x_add(PyLongObject *a, PyLongObject *b)
    {
        Py_ssize_t size_a = Py_ABS(Py_SIZE(a)), size_b = Py_ABS(Py_SIZE(b));
        PyLongObject *z;
        ...
        /* Ensure a is the larger of the two: */
        ...
        z = _PyLong_New(size_a+1);  
        ...
    

    after the addition the result is normalized:

     ...
     return long_normalize(z);
    

    };

    i.e. the possible leading zeros are discarded, but the memory isn't released - 4 bytes aren't worth it, the source of the function can be found here.


    Now, we can use @Nathans insight to explain, why the footprint of (1<<30)+i is 48.56 and not 44.xy: The used py_malloc-allocator uses memory-blocks with alignment of 8 bytes, that means 36 bytes will be stored in a block of size 40 - the same as the result of (1<<60)+i (keep the additional 8-bytes for pointers in mind).


    To explain the remaining 0.5 bytes we need to dive deeper into details of py_malloc-allocator. A good overview is the source-code itself, my last try to describe it can be found in this SO-post.

    In a nutshell, the allocator manages memory in arenas, each 256MB. When an arena is allocated, the memory is reserved, but not commited. We see memory as "used", only when a so called pool is touched. A pool is 4Kb big (POOL_SIZE) and is used only for memory-blocks with same size - in our case 32 byte. That means the resolution of peak_used_memory is 4Kb and cannot be responsible for those 0.5 bytes.

    However, these pools must be managed, which leads to an additional overhead: py_malloc needs a pool_header per pool:

    /* Pool for small blocks. */
    struct pool_header {
        union { block *_padding;
                uint count; } ref;          /* number of allocated blocks    */
        block *freeblock;                   /* pool's free list head         */
        struct pool_header *nextpool;       /* next pool of this size class  */
        struct pool_header *prevpool;       /* previous pool       ""        */
        uint arenaindex;                    /* index into arenas of base adr */
        uint szidx;                         /* block size class index        */
        uint nextoffset;                    /* bytes to virgin block         */
        uint maxnextoffset;                 /* largest valid nextoffset      */
    };
    

    The size of this struct is 48 (called POOL_OVERHEAD) bytes on my Linux_64 machine. This pool_header is a part of the pool (a quite smart way to avoid additional allocation via cruntime-memory-allocator) and will take place of two 32-byte-blocks, that means a pool has place for 126 32 byte integers:

    /* Return total number of blocks in pool of size index I, as a uint. */
    #define NUMBLOCKS(I) ((uint)(POOL_SIZE - POOL_OVERHEAD) / INDEX2SIZE(I))
    

    Which leads to:

    • 4Kb/126 = 32.51 bytes footprint for 1000+i, plus additional 8 bytes for the pointer.
    • (30<<1)+i needs 40 bytes, that means 4Kb has place for 102 blocks, of which one (there are remaining 16 bytes when pool is divided in 40-bytes block, and they can be used for the pool_header) is used forpool_header, which leads to 4Kb/101=40.55 bytes (plus 8 byte pointer).

    We can also see, that there are some additional overhead, responsible for ca. 0.01 byte per integer - not big enough for me to care.

提交回复
热议问题