A string searching algorithm to quickly match an abbreviation in a large list of unabbreviated strings?

问题

I am having a lot of trouble finding a string matching algorithm that fits my requirements.

I have a very large database of strings in an unabbreviated form that need to be matched to an arbitrary abbreviation. A string that is an actual substring with no letters between its characters should also match, and with a higher score.

Example: if the word to be matched within was "download" and I searched "down", "ownl", and then "dl", I would get the highest matching score for "down", followed by "ownl" and then "dl".

The algorithm would have to be optimized for speed and a large number of strings to be searched through, and should allow me to pull back a list of matching items strings (if I had added both "download" and "upload" to the database, searching "load" should return both). Memory is still important, but not as important as speed.

Any ideas? I've done a bunch of research on some of these algorithms but I haven't found any that even touch abbreviations, let alone with all these conditions!

回答1:

I'd wonder if Peter Norvig's spell checker could be adapted in some way for this problem.

It's a stretch that I haven't begun to work out, but it's such an elegant solution that it's worth knowing about.

来源：https://stackoverflow.com/questions/2304687/a-string-searching-algorithm-to-quickly-match-an-abbreviation-in-a-large-list-of

标签

string

abbreviation

acronym

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!