I was wondering if .NET offers any standard functionality for doing a prefix search through a list or a dictionary object. I came across the StringDictionary
, b
Below is a basic implementation of a set of strings that can be searched efficiently by prefix.
The idea is to keep all the words of the set in a trie, and when queried to find all words that start with some prefix, we find the node corresponding to the last character in the prefix, and in DFS from there we collect and return all its descendants.
public class PrefixSearchableSet
{
private readonly Dictionary<char, TrieNode> _letterToNode = new Dictionary<char, TrieNode>();
private bool _isEmptyWordIncluded;
public PrefixSearchableSet(IEnumerable<string> words = null)
{
if (words is null) return;
foreach (string word in words)
{
AddWord(word);
}
}
public void AddWord(string word)
{
if (word is null) return;
if (word is "") _isEmptyWordIncluded = true;
else
{
TrieNode node = FindOrAdd(_letterToNode, word[0]);
foreach (char c in word.Skip(1))
{
node = FindOrAdd(node.Children, c);
}
node.Word = word;
}
}
public List<string> GetWords(string prefix)
{
List<string> words = new List<string>();
if (prefix is null) return words;
if (prefix is "")
{
if (_isEmptyWordIncluded) words.Add("");
foreach (TrieNode trieNode in _letterToNode.Values)
{
trieNode.CollectWords(words);
}
return words;
}
_letterToNode.TryGetValue(prefix[0], out TrieNode node);
foreach (char c in prefix.Skip(1))
{
if (node is null) break;
node.Children.TryGetValue(c, out node);
}
node?.CollectWords(words);
return words;
}
private static TrieNode FindOrAdd(Dictionary<char, TrieNode> letterToNode, char key)
{
if (letterToNode.TryGetValue(key, out TrieNode node)) return node;
return letterToNode[key] = new TrieNode();
}
private class TrieNode
{
public Dictionary<char, TrieNode> Children { get; } = new Dictionary<char, TrieNode>();
public string Word { get; set; }
public void CollectWords(List<string> words)
{
if (Word != null) words.Add(Word);
foreach (TrieNode child in Children.Values)
{
child.CollectWords(words);
}
}
}
}
StringDictionary
is merely a hash table where the keys and values are string
s. This existed before generics (when Dictionary<string, string>
was not possible).
The data structure that you want here is a trie. There are implementations on CodeProject:
Or, if you're that kind of guy, roll your own (see CLRS).
I made a generic implementation of this available here.
Since string
implements IEnumerable<char>
, you can use it with char
as parameter for TKeyElement
.
I think the StringDictionary
is old school (pre-generics). You should probably use a Dictionary(Of String, String)
instead because it implements IEnumerable (think LINQ). One extremely lame thing about StringDictionary is that it's case-insensitive.
I don't believe StringDictionary supports a prefix search, but if you use a SortedList<,>
you can binary search through the range of keys until you find the first entry before and after your prefix.