Over the holidays, my family loves to play Boggle. Problem is, I'm terrible at Boggle. So I did what any good programmer would do: wrote a program to play for me.
At the core of the algorithm is a simple prefix trie, where each node is a dict
of references to the next letters.
This is the trie:add
implementation:
add([], Trie) -> dict:store(stop, true, Trie); add([Ch|Rest], Trie) -> % setdefault(Key, Default, Dict) -> % case dict:find(Key, Dict) of % { ok, Val } -> { Dict, Val } % error -> { dict:new(), Default } % end. { NewTrie, SubTrie } = setdefault(Ch, dict:new(), Trie), NewSubTrie = add(Rest, SubTrie), dict:store(Ch, NewSubTrie, NewTrie).
And you can see the rest, along with an example of how it's used (at the bottom), here:
Now, this being my first serious program in Erlang, I know there are probably a bunch of things wrong with it… But my immediate concern is that it uses 800 megabytes of RAM.
So, what am I doing most-wrong? And how might I make it a bit less-wrong?
You could implement this functionality by simply storing the words in an ets table:
% create table; add words
> ets:new(words, [named_table, set]).
> ets:insert(words, [{"zed"}]).
> ets:insert(words, [{"zebra"}]).
% check if word exists
> ets:lookup(words, "zed").
[{"zed"}]
% check if "ze" has a continuation among the words
78> ets:match(words, {"ze" ++ '$1'}).
[["d"],["bra"]]
If trie is a must, but you can live with a non-functional approach, then you can try digraphs, as Paul already suggested.
If you want to stay functional, you might save some bytes of memory by using structures using less memory, for example proplists, or records, such as -record(node, {a,b,....,x,y,z})
.
I don't remember how much memory a dict takes, but let's estimate. You have 2.5e6 characters and 2e5 words. If your trie had no sharing at all, that would take 2.7e6 associations in the dicts (one for each character and each 'stop' symbol). A simple purely-functional dict representation would maybe 4 words per association -- it could be less, but I'm trying to get an upper bound. On a 64-bit machine, that'd take 8*4*2.7 million bytes, or 86 megabytes. That's only a tenth of your 800M, so something's surely wrong here.
Update: dict.erl represents dicts with a hashtable; this implies lots of overhead when you have a lot of very small dicts, as you do. I'd try changing your code to use the proplists module, which ought to match my calculations above.
An alternative way to solve the problem is going through the word list and see if the word can be constructed from the dice. That way you need very little RAM, and it might be more fun to code. (optimizing and concurrency)
Look into DAWGs. They're much more compact than tries.
I don't know about your algorithm, but if you're storing that much data, maybe you should look into using Erlang's built-in digraph library to represent your trie, instead of so many dicts. http://www.erlang.org/doc/man/digraph.html
If all words are in English, and the case doesn't matter, all characters can be encoded by numbers from 1 to 26 (and in fact, in Erlang they are numbers from 97 to 122), reserving 0 for stop. So you can use the array
module as well.
来源:https://stackoverflow.com/questions/1964015/erlang-what-is-most-wrong-with-this-trie-implementation