I\'m storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.
class Pe
As others have said in their answers, you'll have to generate different objects for the comparison to make sense.
So, let's compare some approaches.
tuple
l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB
class Person
class Person:
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB
namedtuple
(tuple
+ __slots__
)from collections import namedtuple
Person = namedtuple('Person', 'first last')
l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB
namedtuple
is basically a class that extends tuple
and uses __slots__
for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True
).
class Person
+ __slots__
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB
This is a trimmed-down version of namedtuple
above. A clear winner, even better than pure tuples.
Using __slots__
decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict
to store the attributes.
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__
attribute.
There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__
and __weakref__
. It is implemented in the library recordclass:
$ pip install recordclass
>>> import sys
>>> from recordclass import dataobject, make_dataclass
Create the class:
class Person(dataobject):
first:str
last:str
or
>>> Person = make_dataclass('Person', 'first last')
As result:
>>> print(sys.getsizeof(Person(100,100)))
32
For __slot__
based class we have:
class Person:
__slots__ = ['first', 'last']
def __init__(self, first, last):
self.first = first
self.last = last
>>> print(sys.getsizeof(Person(100,100)))
64
As a result more saving of memory is possible.
For dataobject
-based:
l = [Person(i, i) for i in range(10000000)]
memory size: 681 Mb
For __slots__
-based:
l = [Person(i, i) for i in range(10000000)]
memory size: 921 Mb
In your second example, you only create one object, because tuples are constants.
>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176
Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.