custom dict that allows delete during iteration

后端 未结 7 1949
时光说笑
时光说笑 2020-12-08 09:53

UPDATED based on Lennart Regebro\'s answer

Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:

<
相关标签:
7条回答
  • 2020-12-08 10:28

    Naive implementation for Python 2.x and 3.x:

    import sys
    from collections import deque
    
    
    def _protect_from_delete(func):
        def wrapper(self, *args, **kwargs):
            try:
                self._iterating += 1
                for item in func(self, *args, **kwargs):
                    yield item
            finally:
                self._iterating -= 1
                self._delete_pending()
        return wrapper
    
    class DeletableDict(dict):
        def __init__(self, *args, **kwargs):
            super(DeletableDict, self).__init__(*args, **kwargs)
            self._keys_to_delete = deque()
            self._iterating = 0
    
        if sys.version_info[0] != 3:
            iterkeys = _protect_from_delete(dict.iterkeys)
            itervalues = _protect_from_delete(dict.itervalues)
            iteritems = _protect_from_delete(dict.iteritems)
        else:
            keys = _protect_from_delete(dict.keys)
            values = _protect_from_delete(dict.values)
            items = _protect_from_delete(dict.items)  
        __iter__ = _protect_from_delete(dict.__iter__)
    
        def __delitem__(self, key):
            if not self._iterating:
                return super(DeletableDict, self).__delitem__(key)
            self._keys_to_delete.append(key)
    
        def _delete_pending(self):
            for key in self._keys_to_delete:
                super(DeletableDict, self).__delitem__(key)
            self._keys_to_delete.clear()
    
    if __name__ == '__main__':
        dct = DeletableDict((i, i*2) for i in range(15))
        if sys.version_info[0] != 3:
            for k, v in dct.iteritems():
                if k < 5:
                    del dct[k]
            print(dct)
            for k in dct.iterkeys():
                if k > 8:
                    del dct[k]
            print(dct)
            for k in dct:
                if k < 8:
                    del dct[k]
            print(dct)
        else:
            for k, v in dct.items():
                if k < 5:
                    del dct[k]
            print(dct)
    

    When iterating over keys, items or values it sets flag self._iterating. In __delitem__ it checks for ability to delete item, and stores keys in temporary queue. At the end of iterations it deletes all pending keys.

    It's very naive implementation, and I wouldn't recommend to use it in production code.

    EDIT

    Added support for Python 3 and improvements from @jsbueno comments.

    Python 3 run on Ideone.com

    0 讨论(0)
  • 2020-12-08 10:34

    What you need to do is to not modify the list of keys you iterating over. You can do this in three ways:

    1. Make a copy of the keys in a separate list and iterate over that. You can then safely delete the keys in the dictionary during iteration. This is the easiest, and fastest, unless the dictionary is huge in which case you should start thinking about using a database in any case. Code:

      for k in list(dict_):
        if condition(k, dict_[k]):
          del dict_[k]
          continue
        # do other things you need to do in this loop
      
    2. Make a copy not of the keys you are iterating over, but a copy of the keys you are to delete. In other words, don't delete these keys while iterating instead add them to a list, then delete the keys in that list once you are finished iterating. This is slightly more complicated than 1. but much less than 3. It is also fast. This is what you do in your first example.

      delete_these = []
      for k in dict_:
        if condition(k, dict_[k]):
          delete_these.append(k)
          continue
        # do other things you need to do in this loop
      
      for k in delete_these:
          del dict_[k]
      
    3. The only way to avoid making some sort of new list is, as you suggest, to make a special dictionary. But that requires when you delete keys it does not actually delete the keys, but only mark them as deleted, and then delete them for real only once you call a purge method. This requires quite a lot of implementation and there are edge-cases and you'll fudge yourself by forgetting to purge, etc. And iterating over the dictionary must still include the deleted keys, which will bite you at some point. So I wouldn't recommend this. Also, however you implement this in Python, you are likely to just once again end up with a list of things to delete, so it's likely to just be a complicated and error prone version of 2. If you implement it in C, you could probably get away with the copying by adding the flags directly into the hash-key structure. But as mentioned, the problems really overshadow the benefits.

    0 讨论(0)
  • 2020-12-08 10:40

    As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of dict.

    class dd_dict(dict):    # the dd is for "deferred delete"
        _deletes = None
        def __delitem__(self, key):
            if key not in self:
                raise KeyError(str(key))
            dict.__delitem__(self, key) if self._deletes is None else self._deletes.add(key)
        def __enter__(self):
            self._deletes = set()
        def __exit__(self, type, value, tb):
            for key in self._deletes:
                try:
                    dict.__delitem__(self, key)
                except KeyError:
                    pass
            self._deletes = None
    

    Usage:

    # make the dict and do whatever to it
    ffffd = dd_dict(a=1, b=2, c=3)
    
    # now iterate over it, deferring deletes
    with ffffd:
        for k, v in ffffd.iteritems():
            if k is "a":
                del ffffd[k]
                print ffffd     # shows that "a" is still there
    
    print ffffd                 # shows that "a" has been deleted
    

    If you're not in a with block, of course, deletes are immediate; as this is a dict subclass, it works just like a regular dict outside of a context manager.

    You could also implement this as a wrapper class for a dictionary:

    class deferring_delete(object):
        def __init__(self, d):
            self._dict = d
        def __enter__(self):
            self._deletes = set()
            return self
        def __exit__(self, type, value, tb):
            for key in self._deletes:
                try:
                    del self._dict[key]
                except KeyError:
                    pass
            del self._deletes
        def __delitem__(self, key):
            if key not in self._dict:
                raise KeyError(str(key))
            self._deletes.add(key)
    
    d = dict(a=1, b=2, c=3)
    
    with deferring_delete(d) as dd:
        for k, v in d.iteritems():
            if k is "a":
                del dd[k]    # delete through wrapper
    
    print d
    

    It's even possible to make the wrapper class fully functional as a dictionary, if you want, though that's a fair bit more code.

    Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it's not testing a flag on each delete.

    0 讨论(0)
  • 2020-12-08 10:41

    You can accomplish this by iterating over a static list of the key/value pairs of the dictionary, instead of iterating over a dictionary view.

    Basically, iterating over list(dict_.items()) instead of dict_.items() will work:

    for k, v in list(dict_.items()):
      if condition(k, v):
        del dict_[k]
        continue
      # do other things you need to do in this loop
    

    Here is an example (ideone):

    dict_ = {0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 5: 'f', 6: 'g'}
    for k, v in list(dict_.items()):
        if k % 2 == 0:
            print("Deleting  ", (k, v))
            del dict_[k]
            continue
        print("Processing", (k, v))
    

    and the output:

    Deleting   (0, 'a')
    Processing (1, 'b')
    Deleting   (2, 'c')
    Processing (3, 'd')
    Deleting   (4, 'e')
    Processing (5, 'f')
    Deleting   (6, 'g')
    
    0 讨论(0)
  • 2020-12-08 10:41

    This could work as a compromise between the two examples - two lines longer than the second one, but shorter and slightly faster than the first. Python 2:

    dict_ = {k : random.randint(0, 40000) for k in range(0,200000)}
    
    dict_remove = [k for k,v in dict_.iteritems() if v < 3000]
    for k in dict_remove:
        del dict_[k]
    

    Split into a function and it's down to one line each call (whether this is more readable or not is your call):

    def dict_remove(dict_, keys):
        for k in keys:
            del dict_[k]
    
    dict_remove(dict_, [k for k,v in dict_.iteritems() if v < 3000])
    

    Regardless of where the code is stored, you'll have to store the keys needing deletion somewhere. The only way around that is using generator expressions, which will explode the moment you delete a key for the first time.

    0 讨论(0)
  • 2020-12-08 10:44
    1. You can make a copy of the list of keys (you don't need to copy te values) at the beginning of the iteration, and iterate over those (checking that the key is there). This is inefficient if there ar a lot of keys.
    2. You can arrange embed your first example code inside a class. __iter__ and __delitem__ and other special methods need to collaborate to keep a list of items to be removed while an iteration happens. When there are no current iterations, __delitem__ can just delete an item, but when at least one iteration is happening, it should just add the key to be deleted into a list. When the last active iteration finishes, it should actually delete things. This somewhat inefficient if there's a lot of keys to remove, and will, of course, blow up if there's always at least one iteration going on.
    0 讨论(0)
提交回复
热议问题