Specific Dynamic nested Dictionaries, Autovivification implementation

问题

I'm trying to implement a nested dictionary structure in a specific manner. I'm reading in a long list of words. These words are eventually going to need to be searched through often and efficiently so this is how I want my dictionary to be set up:

I'm trying to make a nested dictionary structure where the first key value is the length of the word, the value is a dict with the key being the first letter of the word and the value is a dict with the key being the second letter of the word and the value being a dict with the key as third letter of the word etc..

so if I read in "car" "can" and "joe"

I get

{3: {c: {a: {r: car, n: can}}},j: {o: {e: joe}}}

I need to do this for about 100,000 words though and they vary in length from 2 to 27 letters.

I've looked through What is the best way to implement nested dictionaries? and Dynamic nested dictionaries.

but haven't had any luck figuring this out.

I can certainly get my words out of my text file using

for word in text_file.read().split()

and I can break into each character using

for char in word

for i in range(len(word)):
    word[i]

I just can't figure out how to get this structure down. Any help would be greatly appreciated.

回答1:

Here's a short example on how to implement trie with autovivification built on defaultdict. For every node that terminates a word it stores extra key term to indicate it.

from collections import defaultdict

trie = lambda: defaultdict(trie)

def add_word(root, s):
    node = root
    for c in s:
        node = node[c]
    node['term'] = True

def list_words(root, length, prefix=''):
    if not length:
        if 'term' in root:
            yield prefix
        return

    for k, v in root.items(): 
        if k != 'term':
            yield from list_words(v, length - 1, prefix + k)

WORDS = ['cars', 'car', 'can', 'joe']
root = trie()
for word in WORDS:
    add_word(root, word)

print('Length {}'.format(3))
print('\n'.join(list_words(root, 3)))
print('Length {}'.format(4))
print('\n'.join(list_words(root, 4)))

Output:

Length 3
joe
can
car
Length 4
cars

回答2:

Not being sure what your purpose of this structure is, here's a solution using recursion to generate the structure that you describe:

from collections import defaultdict
d = defaultdict(list)
words = ['hello', 'world', 'hi']


def nest(d, word):
    if word == "":
        return d
    d = {word[-1:]: word if d is None else d}
    return nest(d, word[:-1])


for word in words:
    l = len(word)
    d[l].append(nest(None, word))

print(d)

回答3:

Here's a way to do it without using collections.defaultdict or creating your own custom subclass of dict—so the resulting dictionary is just a ordinary dict object:

import pprint

def _build_dict(wholeword, chars, val, dic):
    if len(chars) == 1:
        dic[chars[0]] = wholeword
        return
    new_dict = dic.get(chars[0], {})
    dic[chars[0]] = new_dict
    _build_dict(wholeword, chars[1:], val, new_dict)

def build_dict(words):
    dic = {}
    for word in words:
        root = dic.setdefault(len(word), {})
        _build_dict(word, list(word), word[1:], root)
    return dic

words = ['a', 'ox', 'car', 'can', 'joe']
data_dict = build_dict(words)
pprint.pprint(data_dict)

Output:

{1: {'a': 'a'},
 2: {'o': {'x': 'ox'}},
 3: {'c': {'a': {'n': 'can', 'r': 'car'}}, 'j': {'o': {'e': 'joe'}}}}

It's based on a recursive algorithm illustrated in a message in a python.org Python-list Archives thread titled Building and Transvering multi-level dictionaries.

来源：https://stackoverflow.com/questions/41007660/specific-dynamic-nested-dictionaries-autovivification-implementation

标签

python

dictionary

autovivification