In general, Python sets don\'t seem to be designed for retrieving items by key. That\'s obviously what dictionaries are for. But is there anyway that, given a key, you can
I think you'll have the answer here:
Moving Beyond Factories in Python
I'd definitely use a dictionary here. Reusing the firstname
instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.
To actually save memory, add a __slots__
attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__
attribute, which will save much more memory than the potential overhead of a dict
over a set
.
Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:
def rand_str():
return str.join("", (chr(random.randrange(97, 123))
for i in range(random.randrange(3, 16))))
class A(object):
def __init__(self):
self.x = rand_str()
self.y = rand_str()
def __hash__(self):
return hash(self.x)
def __eq__(self, other):
return self.x == other.x
The amount of memory used by a set of 1,000,000 instances of this class
random.seed(42)
s = set(A() for i in xrange(1000000))
is on my machine 240 MB. If I add
__slots__ = ("x", "y")
to the class, this goes down to 112 MB. If I store the same data in a dictionary
def key_value():
a = A()
return a.x, a
random.seed(42)
d = dict(key_value() for i in xrange(1000000))
this uses 249 MB without __slots__
and 121 MB with __slots__
.
Yes, you can do this: A set
can be iterated over. But note that this is an O(n) operation as opposed to the O(1) operation of the dict.
So, you have to trade off speed versus memory. This is a classic. I personally would optimize for here (i.e. use the dictionary), since memory won't get short so quickly with only 10,000,000 objects and using dictionaries is really easy.
As for additional memory consumption for the firstname
string: Since strings are immutable in Python, assigning the firstname
attribute as a key will not create a new string, but just copy the reference.