ie:
Google Search can take \"fon
You could plug Lucene, which has a dictionary facility implementing the Levenshtein distance method.
Here's an example from the Wiki, where 2 is the distance.
String[] l=spellChecker.suggestSimilar("sevanty", 2);
//l[0] = "seventy"
First off:
Use the one of your choice. I suspect it runs the query against a spell-checking engine with a word limit of exactly one, it then does nothing if the entire query is valid, otherwise it replaces each word with that word's best match. In other words, the following algorithm (an empty return string means that the query had no problems):
startup()
{
set the spelling engines word suggestion limit to 1
}
option 1()
{
int currentPosition = engine.NextWord(start the search at word 0, querystring);
if(currentPosition == -1)
return empty string; // Query is a-ok.
while(currentPosition != -1)
{
queryString = engine.ReplaceWord(engine.CurrentWord, queryString, the suggestion with index 0);
currentPosition = engine.NextWord(currentPosition, querystring);
}
return queryString;
}
The Google SOAP Search APIs do that.
Since no one has yet mentioned it, I'll give one more phrase to search for: "edit distance" (for example, link text). That can be used to find closest matches, assuming it's typos where letters are transposed, missing or added.
But usually this is also coupled with some sort of relevancy information; either by simple popularity (to assume most commonly used close-enough match is most likely correct word), or by contextual likelihood (words that follow preceding correct word, or come before one). This gets into information retrieval; one way to start is to look at bigram and trigrams (sequences of words seen together). Google has very extensive freely available data sets for these.
For simple initial solution though a dictionary couple with Levenshtein-based matchers works surprisingly well.
If you have a dictionary stored as a trie, there is a fairly straightforward way to find best-matching entries, where characters can be inserted, deleted, or replaced.
void match(trie t, char* w, string s, int budget){
if (budget < 0) return;
if (*w=='\0') print s;
foreach (char c, subtrie t1 in t){
/* try matching or replacing c */
match(t1, w+1, s+c, (*w==c ? budget : budget-1));
/* try deleting c */
match(t1, w, s, budget-1);
}
/* try inserting *w */
match(t, w+1, s + *w, budget-1);
}
The idea is that first you call it with a budget of zero, and see if it prints anything out. Then try a budget of 1, and so on, until it prints out some matches. The bigger the budget the longer it takes. You might want to only go up to a budget of 2.
Added: It's not too hard to extend this to handle common prefixes and suffixes. For example, English prefixes like "un", "anti" and "dis" can be in the dictionary, and can then link back to the top of the dictionary. For suffixes like "ism", "'s", and "ed" there can be a separate trie containing just the suffixes, and most words can link to that suffix trie. Then it can handle strange words like "antinationalizationalization".
You can also use the Google API's to spell check. There is an ASP implementation here (I'm not to credit for this, though).