python: class vs tuple huge memory overhead (?)

后端 未结 4 915
萌比男神i
萌比男神i 2021-02-20 12:16

I\'m storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.

class Pe         


        
相关标签:
4条回答
  • 2021-02-20 12:31

    As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

    So, let's compare some approaches.

    tuple

    l = [(i, i) for i in range(10000000)]
    # memory taken by Python3: 1.0 GB
    

    class Person

    class Person:
        def __init__(self, first, last):
            self.first = first
            self.last = last
    
    l = [Person(i, i) for i in range(10000000)]
    # memory: 2.0 GB
    

    namedtuple (tuple + __slots__)

    from collections import namedtuple
    Person = namedtuple('Person', 'first last')
    
    l = [Person(i, i) for i in range(10000000)]
    # memory: 1.1 GB
    

    namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

    class Person + __slots__

    class Person:
        __slots__ = ['first', 'last']
        def __init__(self, first, last):
            self.first = first
            self.last = last
    
    l = [Person(i, i) for i in range(10000000)]
    # memory: 0.9 GB
    

    This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.

    0 讨论(0)
  • 2021-02-20 12:38

    Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes.

    class Person:
        __slots__ = ['first', 'last']
        def __init__(self, first, last):
            self.first = first
            self.last = last
    

    The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__ attribute.

    0 讨论(0)
  • 2021-02-20 12:48

    There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__. It is implemented in the library recordclass:

    $ pip install recordclass
    
    >>> import sys
    >>> from recordclass import dataobject, make_dataclass
    

    Create the class:

    class Person(dataobject):
       first:str
       last:str
    

    or

    >>> Person = make_dataclass('Person', 'first last')
    

    As result:

    >>> print(sys.getsizeof(Person(100,100)))
    32
    

    For __slot__ based class we have:

    class Person:
        __slots__ = ['first', 'last']
        def __init__(self, first, last):
            self.first = first
            self.last = last
    
    >>> print(sys.getsizeof(Person(100,100)))
    64
    

    As a result more saving of memory is possible.

    For dataobject-based:

    l = [Person(i, i) for i in range(10000000)]
    memory size: 681 Mb
    

    For __slots__-based:

      l = [Person(i, i) for i in range(10000000)]
      memory size: 921 Mb
    
    0 讨论(0)
  • 2021-02-20 12:53

    In your second example, you only create one object, because tuples are constants.

    >>> l = [('foo', 'bar') for i in range(10000000)]
    >>> id(l[0])
    4330463176
    >>> id(l[1])
    4330463176
    

    Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.

    0 讨论(0)
提交回复
热议问题