I have the following requirement: -
I have many (say 1 million) values (names). The user will type a search string.
I don\'t expect the user to spell the names c
the Bitap Algorithm is designed to find an approximate match in a body of text. Maybe you could use that to calculate probable matches. (it's based on the Levenshtein Distance)
(Update: after having read Ben S answer (use an existing solution, possibly aspell
) is the way to go)
As others said, Google does auto correction by watching users correct themselves. If I search for "someting
" (sic) and then immediately for "something
" it is very likely that the first query was incorrect. A possible heuristic to detect this would be:
then the second query is a possible refinement of the first query which you can store and present to other users.
Note that you probably need a lot of queries to gather enough data for these suggestions to be useful.
I would consider using a pre-existing solution for this.
Aspell with a custom dictionary of the names might be well suited for this. Generating the dictionary file will pre-compute all the information required to quickly give suggestions.