I want to implement incremental search on a list of strings. Consider I have an array containing which contains the strings store,state,stamp,crawl,crow. My application has a te
Below is a function that will incrementally search a string for a substring to match.
public IEnumerable<int> FindAllMatches(string toMatch, string source) {
var last = 0;
do {
var cur = source.IndexOf(toMatch,last);
if ( cur < 0 ) {
break;
}
yield return cur;
last = cur + toMatch.Length;
while(true);
}
I've had to do something similar in the past, using a collection that contained approximately 500,000 words. I found that a directed acyclic word graph worked well. A DAWG has roughly the same performance as a trie, but will be more space efficient. It is, however, slightly more complex to implement.
Unfortunately, my work was in C, and I don't have a good reference for a DAWG implementation in C#.
Whoa...
Just use the builtin AutoComplete functionality on the textbox. You can provide it with your list of words and it will do the matching for you.
You could just look at the newly entered letter; if the new third letter is an 'a' just throw out all elements without 'a' at position three. If the user deletes a letter you have to rescan the whole original list and bring back all priviously removed items.
But what if the user pastes multiple letters from the clipboard, deletes multiple letters by selecting them, inserts or deletes a single or multiple letters somewhere in the middle?
You have just to many cases to watch for. You could do the method with the newly entered letter an fall back to a complete rescan if the search text changed in a way other than adding a single letter, but even this simple method is probably not worth the effort just to avoid a few ten or hundred string comparisons. As already mentioned, a Trie or Patricia trie is the way to go if you have really large data sets or want to be really quick.
Well, I have implemented a Trie and a DAWG for this problem and I stumbled upon 2 head scratchers:
1) DAWG --> Directed ACYCLIC Word Graph. How do you create this graph/traverse it with words like 'bot' and 'boot' the 'oo' in boot would cause a cycle based on a DAWG 2) A Trie eliminates this problem but then introduces some branch managing problems.
Constructing the graph is much easier (IMO) than actually using it to produce the words you want without also incurring more runtime.
I am still working on this.
A trie data structure would scale well, if your list can grow to significant length (more than hundreds of entries). Check out e.g. this example implementation.