How would I implement a dict with Abstract Base Classes in Python? [duplicate]

倖福魔咒の 提交于 2019-11-27 23:10:57

How would I implement a dict with Abstract Base Classes?

A good answer will demonstrate how to make this work, specifically without subclassing dict.

Here's the error message: TypeError: Can't instantiate abstract class D with abstract methods __delitem__, __getitem__, __iter__, __len__, __setitem__

It turns out that one must implement them to use the Abstract Base Class (ABC), MutableMapping.

Implementation

So I implement a mapping that works like a dict in most respects that uses the object's attribute reference dict for the mapping. (Delegation is not the same as inheritance, so we'll just delegate to the instance __dict__, we could use any other ad-hoc mapping, but you don't seem to care about that part of the implementation. It makes sense to do it this way in Python 2, because MutableMapping doesn't have __slots__ in Python 2, so you're creating a __dict__ either way. In Python 3, you could avoid dicts altogether by setting __slots__.)

import collections

class D(collections.MutableMapping):
    '''
    Mapping that works like both a dict and a mutable object, i.e.
    d = D(foo='bar')
    and 
    d.foo returns 'bar'
    '''
    # ``__init__`` method required to create instance from class.
    def __init__(self, *args, **kwargs):
        '''Use the object dict'''
        self.__dict__.update(*args, **kwargs)
    # The next five methods are requirements of the ABC.
    def __setitem__(self, key, value):
        self.__dict__[key] = value
    def __getitem__(self, key):
        return self.__dict__[key]
    def __delitem__(self, key):
        del self.__dict__[key]
    def __iter__(self):
        return iter(self.__dict__)
    def __len__(self):
        return len(self.__dict__)
    # The final two methods aren't required, but nice for demo purposes:
    def __str__(self):
        '''returns simple dict representation of the mapping'''
        return str(self.__dict__)
    def __repr__(self):
        '''echoes class, id, & reproducible representation in the REPL'''
        return '{}, D({})'.format(super(D, self).__repr__(), 
                                  self.__dict__)

Demonstration

And to demonstrate the usage:

>>> d = D((e, i) for i, e in enumerate('abc'))
>>> d
<__main__.D object at 0x7f75eb242e50>, D({'b': 1, 'c': 2, 'a': 0})
>>> d.a
0
>>> d.get('b')
1
>>> d.setdefault('d', []).append(3)
>>> d.foo = 'bar'
>>> print(d)
{'b': 1, 'c': 2, 'a': 0, 'foo': 'bar', 'd': [3]}

And for ensuring the dict API, lesson learned is that you can always check for collections.MutableMapping:

>>> isinstance(d, collections.MutableMapping)
True
>>> isinstance(dict(), collections.MutableMapping)
True

And while a dict is always going to be an instance of a MutableMapping due to registration on collections import, the reverse is not always true:

>>> isinstance(d, dict)
False
>>> isinstance(d, (dict, collections.MutableMapping))
True

After performing this exercise, it is clear to me that using Abstract Base Classes provides only the guarantee of a standard API for users of the class. In this case, users assuming a MutableMapping object will be guaranteed the standard API for Python.

Caveats:

The fromkeys class constructor method is not implemented.

>>> dict.fromkeys('abc')
{'b': None, 'c': None, 'a': None}
>>> D.fromkeys('abc')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'D' has no attribute 'fromkeys'

One could mask the builtin dict methods like get or setdefault

>>> d['get'] = 'baz'
>>> d.get('get')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable

It's fairly simple to unmask again:

>>> del d['get']
>>> d.get('get', 'Not there, but working')
'Not there, but working'

But I wouldn't use this code in production.


Demonstration without a dict, Python 3:

>>> class MM(MutableMapping):
...   __delitem__, __getitem__, __iter__, __len__, __setitem__ = (None,) *5
...   __slots__ = ()
...
>>> MM().__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MM' object has no attribute '__dict__'

At the Least

You need to implement in your subclass, all the abstract methods that you inherit from MutableMapping

class D(MutableMapping):
    def __delitem__(self):
        '''
         Your Implementation for deleting the Item goes here
        '''
        raise NotImplementedError("del needs to be implemented")
    def __getitem__(self):
        '''
         Your Implementation for subscripting the Item goes here
        '''
        raise NotImplementedError("obj[index] needs to be implemented")
    def __iter__(self):
        '''
         Your Implementation for iterating the dictionary goes here
        '''
        raise NotImplementedError("Iterating the collection needs to be implemented")
    def __len__(self):
        '''
         Your Implementation for determing the size goes here
        '''
        raise NotImplementedError("len(obj) determination needs to be implemented")
    def __setitem__(self):
        '''
         Your Implementation for determing the size goes here
        '''
        raise NotImplementedError("obj[index] = item,  needs to be implemented")


>>> D()
<__main__.D object at 0x0258CD50>

Moreover

You need to provide a data structure to store your mapping (hash, AVL, Red Black), and a way to construct your Dictionary

The best way to demonstrate this without actually using a dict anywhere is probably to implement something dead simple, very different from dict, and not completely useless. Like a fixed-sized mapping of fixed-size bytes to same-fixed-size bytes. (You might use this for, e.g., a routing table—it'll be much more compact than a dict mapping unpacked keys to unpacked values, although obviously at the cost of speed and flexibility.)

A hash table is just an array of (hash, key, value) tuples. Since the whole point of this is packing data in, we cram those into a struct, meaning we can just use a big bytearray for storage. To mark a slot empty, we set its hash value to 0—which means we need to "escape" any real 0 by turning it into a 1, which is stupid, but simpler to code. We'll also use the dumbest possible probe algorithm for simplicity.

class FixedHashTable(object):
    hashsize = 8
    def __init__(self, elementsize, size):
        self.elementsize = elementsize
        self.size = size
        self.entrysize = self.hashsize + self.elementsize * 2
        self.format = 'q{}s{}s'.format(self.elementsize, self.elementsize)
        assert struct.calcsize(self.format) == self.entrysize
        self.zero = b'\0' * self.elementsize
        self.store = bytearray(struct.pack(self.format, 0,
                                           self.zero, self.zero)
                               ) * self.size
    def hash(self, k):
        return hash(k) or 1
    def stash(self, i, h, k, v):
        entry = struct.pack(self.format, h, k, v)
        self.store[i*self.entrysize:(i+1)*self.entrysize] = entry
    def fetch(self, i):
        entry = self.store[i*self.entrysize:(i+1)*self.entrysize]
        return struct.unpack(self.format, entry)
    def probe(self, keyhash):
        i = keyhash % self.size
        while True:
            h, k, v = self.fetch(i)
            yield i, h, k, v
            i = (i + 1) % self.size
            if i == keyhash % self.size:
                break

As the error message says, you need to provide implementations for the abstract methods __delitem__, __getitem__, __iter__, __len__, and __setitem__. However, a better place to look is the docs, which will tell you that if you implement those five methods (plus any other methods required by the base classes, but as you can see from the table there are none), you'll get all the other methods for free. You may not get the most efficient possible implementations of all of them, but we'll come back to that.

First, let's deal with __len__. Normally people expect this to be O(1), which means we need to keep track of it independently, updating it as needed. So:

class FixedDict(collections.abc.MutableMapping):
    def __init__(self, elementsize, size):
        self.hashtable = FixedHashTable(elementsize, size)
        self.len = 0

Now, __getitem__ just probes until it finds the desired key or reaches the end:

    def __getitem__(self, key):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if h and k == key:
                return v

And __delitem__ does the same thing, except it empties out the slot if found, and updates len.

    def __delitem__(self, key):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if h and k == key:
                self.hashtable.stash(i, 0, self.hashtable.zero, self.hashtable.zero)
                self.len -= 1
                return
        raise KeyError(key)

__setitem__ is a bit trickier—if found, we have to replace the value in the slot; if not, we have to fill an empty slot. And here we have to deal with the fact that the hash table may be full. And of course we have to take care of len:

    def __setitem__(self, key, value):
        keyhash = self.hashtable.hash(key)
        for i, h, k, v in self.hashtable.probe(keyhash):
            if not h or k == key:
                if not h:
                    self.len += 1
                self.hashtable.stash(i, keyhash, key, value)
                return
        raise ValueError('hash table full')

And that leaves __iter__. Just as with a dict, we don't have any particular order, so we can just iterate the hash table slots and yield all the non-empty ones:

def __iter__(self):
    return (k for (h, k, v) in self.hashtable.fetch(i)
            for i in range(self.hashtable.size) if h)

While we're at it, we might as well write a __repr__. Note that we can use the fact that we get items for free:

def __repr__(self):
    return '{}({})'.format(type(self).__name__, dict(self.items()))

However, note that the default items just creates an ItemsView(self), and if you track that through the source, you'll see that it iterates self and looks up each value. You can obviously do better if the performance matters:

def items(self):
    pairs = ((k, v) for (h, k, v) in self.hashtable.fetch(i)
             for i in range(self.hashtable.size) if h)
    return collections.abc.ItemsView._from_iterable(pairs)

And likewise for values, and possibly other methods.

Steve Barnes

The whole idea of an abstract base class is that it has some members, (pure virtual members in C++ terms), that your code has to supply - in C++ these are the Pure Virtual members and other Virtual members you may override.

Python differs from C++ in that all members of all classes are Virtual and may be overridden, (and you can add members to all classes and instances) but Abstract Base classes have some required missing classes these are the equivent to the C++ pure virtual.

Having got that out of the way you just need to supply the missing members to be able to create an instance of your derived class.

For an example of the sort of thing you are trying to do see the accepted answer here but rather than use a dict within the class you will have to provide the methods that it provides yourself.

With MutableMapping as base class you should create this methods yourself in your class: __delitem__, __getitem__, __iter__, __len__, __setitem__.

To create custom dict class you can derive it from dict:

>>> class D(dict):
...     pass
... 
>>> d = D()
>>> d
{}
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!