问题
I am implementing an autocomplete feature using a Trie. Using the words list in this link, I am inserting every word into my Trie. I want to reduce the memory used by the Trie without using something too fancy like a directed acyclic word graph.
The Trie is dictionary based to allow for it to be stored in a JSON file. This is my current script:
import json
#Make a trie of english words
# The words file can be found at https://github.com/dwyl/english-words
with open('words_dictionary.json', 'r') as file:
words = json.load(file)
_end = '_end_'
trie = {}
def make_trie(words):
root = trie
for word in words:
current = root
for char in word:
if char not in current:
current[char] = {}
current = current[char]
current[_end] = _end
make_trie(words)
with open('word_trie.json', 'w') as outfile:
json.dump(trie, outfile)
If this can be done, please help me out with code snippets.
回答1:
If your trie is static, meaning that you do not need to insert words in it every now and then, but that you can build it "all at once", then the structure you need is a DAFSA, which stands for Directed Acyclic Finite State Automaton. In the case your trie is dynamic, meaning you will need to insert new words in it, the DAFSA is still the answer, but the algorithms are much harder.
This is basically a compressed version of a trie, it has the same access speed, but a much lower space requirement in the general case.
Algorithms to convert a trie into a DAFSA (sometimes called DAWG for Directed Acyclic Word Graph) are not as simple as the ones that simply build a trie, but they're understandable. You should find everything you need here.
来源:https://stackoverflow.com/questions/56627928/reducing-the-size-of-a-trie-of-all-english-words