问题
I'm trying to implement a nested dictionary structure in a specific manner. I'm reading in a long list of words. These words are eventually going to need to be searched through often and efficiently so this is how I want my dictionary to be set up:
I'm trying to make a nested dictionary structure where the first key value is the length of the word, the value is a dict with the key being the first letter of the word and the value is a dict with the key being the second letter of the word and the value being a dict with the key as third letter of the word etc..
so if I read in "car" "can" and "joe"
I get
{3: {c: {a: {r: car, n: can}}},j: {o: {e: joe}}}
I need to do this for about 100,000 words though and they vary in length from 2 to 27 letters.
I've looked through What is the best way to implement nested dictionaries? and Dynamic nested dictionaries.
but haven't had any luck figuring this out.
I can certainly get my words out of my text file using
for word in text_file.read().split()
and I can break into each character using
for char in word
or
for i in range(len(word)):
word[i]
I just can't figure out how to get this structure down. Any help would be greatly appreciated.
回答1:
Here's a short example on how to implement trie with autovivification built on defaultdict. For every node that terminates a word it stores extra key term
to indicate it.
from collections import defaultdict
trie = lambda: defaultdict(trie)
def add_word(root, s):
node = root
for c in s:
node = node[c]
node['term'] = True
def list_words(root, length, prefix=''):
if not length:
if 'term' in root:
yield prefix
return
for k, v in root.items():
if k != 'term':
yield from list_words(v, length - 1, prefix + k)
WORDS = ['cars', 'car', 'can', 'joe']
root = trie()
for word in WORDS:
add_word(root, word)
print('Length {}'.format(3))
print('\n'.join(list_words(root, 3)))
print('Length {}'.format(4))
print('\n'.join(list_words(root, 4)))
Output:
Length 3
joe
can
car
Length 4
cars
回答2:
Not being sure what your purpose of this structure is, here's a solution using recursion to generate the structure that you describe:
from collections import defaultdict
d = defaultdict(list)
words = ['hello', 'world', 'hi']
def nest(d, word):
if word == "":
return d
d = {word[-1:]: word if d is None else d}
return nest(d, word[:-1])
for word in words:
l = len(word)
d[l].append(nest(None, word))
print(d)
回答3:
Here's a way to do it without using collections.defaultdict
or creating your own custom subclass of dict
—so the resulting dictionary is just a ordinary dict
object:
import pprint
def _build_dict(wholeword, chars, val, dic):
if len(chars) == 1:
dic[chars[0]] = wholeword
return
new_dict = dic.get(chars[0], {})
dic[chars[0]] = new_dict
_build_dict(wholeword, chars[1:], val, new_dict)
def build_dict(words):
dic = {}
for word in words:
root = dic.setdefault(len(word), {})
_build_dict(word, list(word), word[1:], root)
return dic
words = ['a', 'ox', 'car', 'can', 'joe']
data_dict = build_dict(words)
pprint.pprint(data_dict)
Output:
{1: {'a': 'a'},
2: {'o': {'x': 'ox'}},
3: {'c': {'a': {'n': 'can', 'r': 'car'}}, 'j': {'o': {'e': 'joe'}}}}
It's based on a recursive algorithm illustrated in a message in a python.org Python-list Archives thread titled Building and Transvering multi-level dictionaries.
来源:https://stackoverflow.com/questions/41007660/specific-dynamic-nested-dictionaries-autovivification-implementation