问题
I am having a lot of trouble finding a string matching algorithm that fits my requirements.
I have a very large database of strings in an unabbreviated form that need to be matched to an arbitrary abbreviation. A string that is an actual substring with no letters between its characters should also match, and with a higher score.
Example: if the word to be matched within was "download" and I searched "down", "ownl", and then "dl", I would get the highest matching score for "down", followed by "ownl" and then "dl".
The algorithm would have to be optimized for speed and a large number of strings to be searched through, and should allow me to pull back a list of matching items strings (if I had added both "download" and "upload" to the database, searching "load" should return both). Memory is still important, but not as important as speed.
Any ideas? I've done a bunch of research on some of these algorithms but I haven't found any that even touch abbreviations, let alone with all these conditions!
回答1:
I'd wonder if Peter Norvig's spell checker could be adapted in some way for this problem.
It's a stretch that I haven't begun to work out, but it's such an elegant solution that it's worth knowing about.
来源:https://stackoverflow.com/questions/2304687/a-string-searching-algorithm-to-quickly-match-an-abbreviation-in-a-large-list-of