Measure Object Size Accurately in Python - Sys.GetSizeOf not functioning

前端 未结 6 1922
一向
一向 2020-11-30 12:31

I am trying to accurately/definitively find the size differences between two different classes in Python. They are both new style classes, save for one not having sl

相关标签:
6条回答
  • 2020-11-30 13:07

    As others have stated, sys.getsizeof only returns the size of the object structure that represents your data. So if, for instance, you have a dynamic array that you keep adding elements to, sys.getsizeof(my_array) will only ever show the size of the base DynamicArray object, not the growing size of memory that its elements take up.

    pympler.asizeof.asizeof() gives an approximate complete size of objects and may be more accurate for you.

    from pympler import asizeof
    asizeof.asizeof(my_object)  # should give you the full object size
    
    0 讨论(0)
  • 2020-11-30 13:12

    sys.getsizeof returns a number which is more specialized and less useful than people think. In fact, if you increase the number of attributes to six, your test3_obj remains at 32, but test4_obj jumps to 48 bytes. This is because getsizeof is returning the size of the PyObject structure implementing the type, which for test3_obj doesn't include the dict holding the attributes, but for test4_obj, the attributes aren't stored in a dict, they are stored in slots, so they are accounted for in the size.

    But a class defined with __slots__ takes less memory than a class without, precisely because there is no dict to hold the attributes.

    Why override __sizeof__? What are you really trying to accomplish?

    0 讨论(0)
  • 2020-11-30 13:13

    I ran into a similar problem and ended up writing my own helper to do the dirty work. Check it out here

    0 讨论(0)
  • 2020-11-30 13:21

    The following function has been tested in Python 3.6, 64bit system. It has been very useful to me. (I picked it up off the internet and tweaked it to my style, and added the use of 'slots' feature. I am unable to find the original source again.)

    def getSize(obj, seen: Optional[Set[int]] = None) -> int:
      """Recursively finds size of objects. Needs: import sys """
      seen = set() if seen is None else seen
    
      if id(obj) in seen: return 0  # to handle self-referential objects
      seen.add(id(obj))
    
      size = sys.getsizeof(obj, 0) # pypy3 always returns default (necessary)
      if isinstance(obj, dict):
        size += sum(getSize(v, seen) + getSize(k, seen) for k, v in obj.items())
      elif hasattr(obj, '__dict__'):
        size += getSize(obj.__dict__, seen)
      elif hasattr(obj, '__slots__'): # in case slots are in use
        slotList = [getattr(C, "__slots__", []) for C in obj.__class__.__mro__]
        slotList = [[slot] if isinstance(slot, str) else slot for slot in slotList]
        size += sum(getSize(getattr(obj, a, None), seen) for slot in slotList for a in slot)
      elif hasattr(obj, '__iter__') and not isinstance(obj, (str, bytes, bytearray)):
        size += sum(getSize(i, seen) for i in obj)
      return size
    

    Now for the objects of the following classes,

    class test3(object):
        def __init__(self):
            self.one = 1
            self.two = "two variable"
    
    class test4(object):
        __slots__ = ('one', 'two')
        def __init__(self):
            self.one = 1
            self.two = "two variable"
    

    the following results are obtained,

    In [21]: t3 = test3()
    
    In [22]: getSize(t3)
    Out[22]: 361
    
    In [23]: t4 = test4()
    
    In [25]: getSize(t4)
    Out[25]: 145
    

    Feedbacks to improve the function are most welcome.

    0 讨论(0)
  • 2020-11-30 13:22

    You might want to use a different implementation for getting the size of your objects in memory:

    >>> import sys, array
    >>> sizeof = lambda obj: sum(map(sys.getsizeof, explore(obj, set())))
    >>> def explore(obj, memo):
        loc = id(obj)
        if loc not in memo:
            memo.add(loc)
            yield obj
            if isinstance(obj, memoryview):
                yield from explore(obj.obj, memo)
            elif not isinstance(obj, (range, str, bytes, bytearray, array.array)):
                # Handle instances with slots.
                try:
                    slots = obj.__slots__
                except AttributeError:
                    pass
                else:
                    for name in slots:
                        try:
                            attr = getattr(obj, name)
                        except AttributeError:
                            pass
                        else:
                            yield from explore(attr, memo)
                # Handle instances with dict.
                try:
                    attrs = obj.__dict__
                except AttributeError:
                    pass
                else:
                    yield from explore(attrs, memo)
                # Handle dicts or iterables.
                for name in 'keys', 'values', '__iter__':
                    try:
                        attr = getattr(obj, name)
                    except AttributeError:
                        pass
                    else:
                        for item in attr():
                            yield from explore(item, memo)
    
    
    >>> class Test1:
        def __init__(self):
            self.one = 1
            self.two = 'two variable'
    
    
    >>> class Test2:
        __slots__ = 'one', 'two'
        def __init__(self):
            self.one = 1
            self.two = 'two variable'
    
    
    >>> print('sizeof(Test1()) ==', sizeof(Test1()))
    sizeof(Test1()) == 361
    >>> print('sizeof(Test2()) ==', sizeof(Test2()))
    sizeof(Test2()) == 145
    >>> array_test1, array_test2 = [], []
    >>> for _ in range(3000):
        array_test1.append(Test1())
        array_test2.append(Test2())
    
    
    >>> print('sizeof(array_test1) ==', sizeof(array_test1))
    sizeof(array_test1) == 530929
    >>> print('sizeof(array_test2) ==', sizeof(array_test2))
    sizeof(array_test2) == 194825
    >>> 
    

    Just make sure that you do not give any infinite iterators to this code if you want an answer back.

    0 讨论(0)
  • 2020-11-30 13:31

    First check the size of the Pyton process in your os' memory manager without many objects.

    Second make many objects of one kind and check the size again.

    Third make many objects of the other kind and check the size.

    Repeat this a few times and if the sizes of each step stay about the same you have got something comparable.

    0 讨论(0)
提交回复
热议问题