Python: Retrieve items from a set

后端 未结 3 1367
执笔经年
执笔经年 2021-01-12 10:13

In general, Python sets don\'t seem to be designed for retrieving items by key. That\'s obviously what dictionaries are for. But is there anyway that, given a key, you can

相关标签:
3条回答
  • 2021-01-12 10:49

    I think you'll have the answer here:

    Moving Beyond Factories in Python

    0 讨论(0)
  • 2021-01-12 10:54

    I'd definitely use a dictionary here. Reusing the firstname instance variable as a dictionary key won't copy it -- the dictionary will simply use the same object. I doubt a dictionary will use significantly more memory than a set.

    To actually save memory, add a __slots__ attribute to your classes. This will prevent each of you 10,000,000 instances from having a __dict__ attribute, which will save much more memory than the potential overhead of a dict over a set.

    Edit: Some numbers to back my claims. I defined a stupid example class storing pairs of random strings:

    def rand_str():
        return str.join("", (chr(random.randrange(97, 123))
                             for i in range(random.randrange(3, 16))))
    
    class A(object):
        def __init__(self):
            self.x = rand_str()
            self.y = rand_str()
        def __hash__(self):
            return hash(self.x)
        def __eq__(self, other):
            return self.x == other.x
    

    The amount of memory used by a set of 1,000,000 instances of this class

    random.seed(42)
    s = set(A() for i in xrange(1000000))
    

    is on my machine 240 MB. If I add

        __slots__ = ("x", "y")
    

    to the class, this goes down to 112 MB. If I store the same data in a dictionary

    def key_value():
        a = A()
        return a.x, a
    
    random.seed(42)
    d = dict(key_value() for i in xrange(1000000))
    

    this uses 249 MB without __slots__ and 121 MB with __slots__.

    0 讨论(0)
  • 2021-01-12 10:55

    Yes, you can do this: A set can be iterated over. But note that this is an O(n) operation as opposed to the O(1) operation of the dict.

    So, you have to trade off speed versus memory. This is a classic. I personally would optimize for here (i.e. use the dictionary), since memory won't get short so quickly with only 10,000,000 objects and using dictionaries is really easy.

    As for additional memory consumption for the firstname string: Since strings are immutable in Python, assigning the firstname attribute as a key will not create a new string, but just copy the reference.

    0 讨论(0)
提交回复
热议问题